Sound signal high frequency compensation method, sound signal post processing method, sound signal decode method, apparatus thereof, program, and storage medium

ABSTRACT

For each frame, an n-th channel compensated decoded sound signal ˜X′n is obtained that is a signal obtained by compensating a high frequency of an n-th channel purified decoded sound signal ˜Xn obtained by performing signal processing in a time domain on an n-th channel decoded sound signal {circumflex over ( )}Xn that is a decoded sound signal of each channel of stereo obtained by decoding a stereo code CS. At this time, for the each frame with respect to the each channel, an n-th channel high-frequency compensation gain ρn that is a value for bringing high-frequency energy of ˜X′n close to high-frequency energy of {circumflex over ( )}Xn is obtained, and for the each frame with respect to the each channel, a signal obtained by adding ˜Xn and a signal obtained by multiplying a high-frequency component of a monaural decoded sound signal that is obtained by decoding a monaural code CM that is a code different from the stereo code CS or a signal obtained by upmixing, for the each channel, the monaural decoded sound signal by the n-th channel high-frequency compensation gain ρn is obtained and output as the n-th channel compensated decoded sound signal ˜X′n.

TECHNICAL FIELD

The present invention relates to a technique for post-processing a soundsignal obtained by decoding a code.

BACKGROUND ART

As a technique for efficiently using a monaural code and a stereo codeto encode/decode a stereo sound signal, there is a technique of PatentLiterature 1. Patent Literature 1 discloses a scalable encoding/decodingmethod in which a monaural code representing a monaural signal and astereo code representing a difference of a stereo signal from themonaural signal are obtained on the encoding side, and on the decodingside, a monaural decoded sound signal and a stereo decoded sound signalare obtained by performing decoding processing corresponding to theencoding side (see FIGS. 7 and 8 ).

As a technique of encoding, transmitting, and decoding a sound signal bya terminal connected to two lines having different priorities, there isa technique of Patent Literature 2. Patent Literature 2 discloses atechnique in which a code for securing minimum quality is included in apacket with high priority and transmitted, and other codes are includedin a packet with low priority and transmitted (see FIG. 1 and the like).

In a case where the scalable encoding/decoding method of PatentLiterature 1 is used in the system of Patent Literature 2, it is onlyrequired to include the monaural code in the packet with high priorityand include the stereo code in the packet with low priority on thetransmission side. In this manner, on the reception side, a monauraldecoded sound signal can be obtained using only the monaural code in acase where only the packet with high priority has arrived, and a stereodecoded sound signal can be obtained using both the monaural code andthe stereo code in a case where the packet with low priority has alsoarrived in addition to the packet with high priority.

CITATION LIST Patent Literature

-   Patent Literature 1: WO 2006/070751-   Patent Literature 2: JP 2005-117132 A

SUMMARY OF INVENTION Technical Problem

In a case where communication is performed by terminals connected to twolines having different priorities, a case where a monauralencoding/decoding method and a stereo encoding/decoding methodindependent from each other are used instead of using the scalableencoding/decoding method is also assumed. Further, a case of using themonaural encoding/decoding method and the stereo encoding/decodingmethod independent from each other in one line having the same priorityis also assumed. In these cases, on the reception side, only the stereocode is used to obtain the stereo decoded sound signal regardless ofwhether or not the monaural code has arrived in addition to the stereocode. That is, in a case where stereo decoding independent of monauraldecoding is performed on the reception side, even if the monaural codeand the stereo code independent of each other derived from the samesound signal are input, there is a problem that the information includedin the monaural code is not utilized in processing of obtaining thestereo sound signal output by the device on the reception side.

Therefore, it is an object of the present invention to improve, in acase where there is a sound signal obtained from a different code, adecoded sound signal by using the sound signal obtained from thedifferent code, the different code being different from a code fromwhich the decoded sound signal is obtained and being derived from thesame sound signal.

Solution to Problem

For each frame, an n-th channel compensated decoded sound signal^(˜)X′_(n) is obtained that is a signal obtained by compensating a highfrequency of an n-th channel purified decoded sound signal ^(˜)X_(n)obtained by performing signal processing in a time domain on an n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) (n is eachinteger of 1 or more and N or less) that is a decoded sound signal ofeach channel of stereo obtained by decoding a stereo code CS. At thistime, there are executed an n-th channel high-frequency compensationgain estimation step of obtaining, for the each frame with respect tothe each channel, an n-th channel high-frequency compensation gain ρ_(n)that is a value for bringing high-frequency energy of the n-th channelcompensated decoded sound signal ^(˜)X′_(n) close to high-frequencyenergy of the n-th channel decoded sound signal^({circumflex over ( )})X_(n), and an n-th channel high-frequencycompensation step of obtaining and outputting, for the each frame withrespect to the each channel, a signal obtained by adding the n-thchannel purified decoded sound signal ^(˜)X_(n) and a signal obtained bymultiplying a high-frequency component of a monaural decoded soundsignal ^({circumflex over ( )})X_(M) that is obtained by decoding amonaural code CM that is a code different from the stereo code CS or ann-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) that is a signal obtained by upmixing,for the each channel, the monaural decoded sound signal^({circumflex over ( )})X_(M) by the n-th channel high-frequencycompensation gain ρ_(n), as the n-th channel compensated decoded soundsignal ^(˜)X′_(n). However, the n-th channel high-frequency compensationstep uses a signal obtained by passing the n-th channel upmixed monauraldecoded sound signal ^(˜)X_(Mn) through a high-pass filter as an n-thchannel compensation signal ^({circumflex over ( )})X′_(n), and obtains,for each corresponding sample t, a sequence based on a value^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t)obtained by adding a sample value ^(˜)x_(n)(t) of the n-th channelpurified decoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t)obtained by multiplying the n-th channel high-frequency compensationgain ρ_(n) by a sample value ^({circumflex over ( )})x′_(n)(t) of then-th channel compensation signal ^({circumflex over ( )})X′_(n), as then-th channel compensated decoded sound signal ^(˜)X′_(n). The n-thchannel high-frequency compensation gain estimation step obtains, foreach corresponding sample t, a sequence based on a value^(˜)x″_(n)(t)=^(˜)x_(n)(t)+^({circumflex over ( )})x′_(n)(t) obtained byadding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and the sample value^({circumflex over ( )})x′_(n)(t) of the n-th channel compensationsignal ^({circumflex over ( )})X′_(n), as an n-th channel temporaryaddition signal ^(˜)X″_(n), and obtains the n-th channel high-frequencycompensation gain ρ_(n) that is a value larger as high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)is smaller than high-frequency energy ^({circumflex over ( )})EX_(n) ofthe n-th channel decoded sound signal ^({circumflex over ( )})X_(n), andis a value larger as a difference between the high-frequency energy ofthe n-th channel purified decoded sound signal ^(˜)X_(n) andhigh-frequency energy of the n-th channel temporary addition signal^(˜)X″_(n) is smaller than the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n).

Advantageous Effects of Invention

According to the present invention, in a case where there is a soundsignal obtained from a different code that is different from a code fromwhich a decoded sound signal is obtained and that is derived from thesame sound signal, the decoded sound signal can be improved by using thesound signal obtained from the different code.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example of a sound signalpurification device 1101.

FIG. 2 is a flowchart illustrating an example of processing of the soundsignal purification device 1101.

FIG. 3 is a flowchart illustrating an example of processing of an n-thchannel purification weight estimation unit 1111-n.

FIG. 4 is a flowchart illustrating an example of processing of the n-thchannel purification weight estimation unit 1111-n.

FIG. 5 is a block diagram illustrating an example of a sound signalpurification device 1102.

FIG. 6 is a flowchart illustrating an example of processing of the soundsignal purification device 1102.

FIG. 7 is a block diagram illustrating an example of a sound signalpurification device 1103.

FIG. 8 is a flowchart illustrating an example of processing of the soundsignal purification device 1103.

FIG. 9 is a block diagram illustrating an example of a sound signalpurification device 1201.

FIG. 10 is a flowchart illustrating an example of processing of thesound signal purification device 1201.

FIG. 11 is a block diagram illustrating an example of a sound signalpurification device 1202.

FIG. 12 is a flowchart illustrating an example of processing of thesound signal purification device 1202.

FIG. 13 is a block diagram illustrating an example of a sound signalpurification device 1203.

FIG. 14 is a flowchart illustrating an example of processing of thesound signal purification device 1203.

FIG. 15 is a block diagram illustrating an example of a sound signalpurification device 1301.

FIG. 16 is a flowchart illustrating an example of processing of thesound signal purification device 1301.

FIG. 17 is a block diagram illustrating an example of a sound signalpurification device 1302.

FIG. 18 is a flowchart illustrating an example of processing of thesound signal purification device 1302.

FIG. 19 is a block diagram illustrating an example of a sound signalhigh-frequency compensation device 201.

FIG. 20 is a flowchart illustrating an example of processing of thesound signal high-frequency compensation device 201/202.

FIG. 21 is a block diagram illustrating an example of a sound signalhigh-frequency compensation device 202.

FIG. 22 is a block diagram illustrating an example of a sound signalhigh-frequency compensation device 203.

FIG. 23 is a flowchart illustrating an example of processing of thesound signal high-frequency compensation device 203.

FIG. 24 is a block diagram illustrating an example of a sound signalpost-processing device 301.

FIG. 25 is a flowchart illustrating an example of processing of thesound signal post-processing device 301.

FIG. 26 is a block diagram illustrating an example of a sound signalpost-processing device 302.

FIG. 27 is a flowchart illustrating an example of processing of thesound signal post-processing device 302.

FIG. 28 is a block diagram illustrating an example of a sound signaldecoding device 601.

FIG. 29 is a flowchart illustrating an example of processing of thesound signal decoding device 601.

FIG. 30 is a block diagram illustrating an example of a sound signaldecoding device 602.

FIG. 31 is a flowchart illustrating an example of processing of thesound signal decoding device 602.

FIG. 32 is a block diagram illustrating an example of an encoding device500 and a decoding device 600.

FIG. 33 is a diagram illustrating an example of a functionalconfiguration of a computer that implements respective devices inembodiments of the present invention.

DESCRIPTION OF EMBODIMENTS

Prior to the description of each embodiment, a notation method in thisdescription will be described.

A superscript “^({circumflex over ( )})” or “^(˜)” such as^({circumflex over ( )})x or ^(˜)x for a certain character x should beoriginally described directly above the “x”, but is described as^({circumflex over ( )})x or ^(˜)x due to restriction of notation in thedescription.

<Encoding Device and Decoding Device to which Present Invention isApplied>

First, before describing each embodiment, an encoding device and adecoding device to which the invention is applied will be describedusing an example in a case where the number of channels of stereo istwo.

<<Encoding Device 500>>

As illustrated in FIG. 32 , the encoding device 500 as an applicationdestination includes a downmixing unit 510, a monaural encoding unit520, and a stereo encoding unit 530. The encoding device 500 encodes aninput sound signal in a time domain of two-channel stereo in units offrames having a predetermined time length of 20 ms, for example, toobtain and output a monaural code CM and a stereo code CS to bedescribed later. The sound signal in the time domain of two-channelstereo to be input to the encoding device is, for example, a digitalvoice signal or acoustic signal obtained by AD conversion of sound ofvoice, music, or the like collected by each of two microphones, andincludes a first channel input sound signal that is an input soundsignal of a left channel and a second channel input sound signal that isan input sound signal of a right channel. The monaural code CM and thestereo code CS, which are codes output by the encoding device 500, areinput to the decoding device 600. In the encoding device 500, each unitdescribed above performs the following processing for each frame. Forexample, the frame length is 20 ms, and the sampling frequency is 32kHz. Assuming that the number of samples per frame is T, T is 640 inthis example.

[Downmixing Unit 510]

The first channel input sound signal and the second channel input soundsignal input to the encoding device 500 are input to the downmixing unit510. From the first channel input sound signal and the second channelinput sound signal, the downmixing unit 510 obtains and outputs adownmixed signal that is a signal obtained by mixing the first channelinput sound signal and the second channel input sound signal. Thedownmixing unit 510 obtains the downmixed signal by, for example, thefollowing first method or second method.

[[First Method for Obtaining Downmixed Signal]]

In the first method, the downmixing unit 510 obtains a sequence based onan average value of sample values for each corresponding sample of afirst channel input sound signal X₁={x₁(1), x₁(2), . . . , x₁(T)} and asecond channel input sound signal X₂={x₂(1), x₂(2), . . . , x₂(T)} as adownmixed signal X_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)} (stepS510A). That is, when each sample number (index of each sample) is t,x_(M)(t)=(x₁(t)+x₂(t))/2.

[[Second Method for Obtaining Downmixed Signal]]

In the second method, the downmixing unit 510 performs the followingsteps S510B-1 to S510B-3.

The downmixing unit 510 first obtains an inter-channel time difference τfrom the first channel input sound signal and the second channel inputsound signal (step S510B-1). The inter-channel time difference τ isinformation indicating how far ahead the same sound signal is includedin the first channel input sound signal or the second channel inputsound signal. The downmixing unit 510 may obtain the inter-channel timedifference τ by any known method, and is only required to obtain theinter-channel time difference τ by, for example, a method exemplified inan inter-channel relationship information estimation unit 1132 describedlater in a second embodiment. When the downmixing unit 510 uses themethod exemplified in the inter-channel relationship informationestimation unit 1132 described later in the second embodiment, theinter-channel time difference τ is a positive value in a case where thesame sound signal is included in the first channel input sound signalbefore the second channel input sound signal, and the inter-channel timedifference τ is a negative value in a case where the same sound signalis included in the second channel input sound signal before the firstchannel input sound signal.

Next, the downmixing unit 510 obtains a correlation value between asample sequence of the first channel input sound signal and a samplesequence of the second channel input sound signal at a position shiftedbackward from the sample sequence by the inter-channel time differenceτ, as an inter-channel correlation coefficient γ (step S510B-2).

Next, the downmixing unit 510 performs weighted averaging on the firstchannel input sound signal and the second channel input sound signal sothat the input sound signal of a preceding channel out of the firstchannel input sound signal X₁={x₁(1), x₁(2), . . . , x₁(T)} and thesecond channel input sound signal X₂={x₂(1), x₂(2), . . . , x₂(T)} isincluded to be larger in the downmixed signal X_(M)={x_(M)(1), x_(M)(2),. . . , x_(M)(T)} as the inter-channel correlation coefficient γ islarger, to obtain and output the downmixed signal (step S510B-3). Forexample, the downmixing unit 510 is only required to weight and add thefirst channel input sound signal x₁(t) and the second channel inputsound signal x₂(t) to each corresponding sample number t using a weightdetermined by the inter-channel correlation coefficient γ to obtain thedownmixed signal x_(M)(t). Specifically, the downmixing unit 510 is onlyrequired to obtain x_(M)(t)=((1+γ)/2)×x₁(t)+((1−γ)/2)×x₂(t) in a casewhere the inter-channel time difference τ is a positive value, that is,in a case where the first channel is preceding, and obtainx_(M)(t)=((1−γ)/2)×x₁(t)+((1+γ)/2)×x₂(t) in a case where theinter-channel time difference τ is a negative value, that is, in a casewhere the second channel is preceding, as the downmixed signal x_(M)(t).In a case where the inter-channel time difference τ is zero, that is, ina case where neither channel is preceding, the downmixing unit 510 isonly required to set x_(M)(t)=(x₁(t)+x₂(t))/2 obtained by averaging thefirst channel input sound signal x₁(t) and the second channel inputsound signal x₂(t) as the downmixed signal x_(M)(t) for each samplenumber t.

[Monaural Encoding Unit 520]

The downmixed signal output by the downmixing unit 510 is input to themonaural encoding unit 520. The monaural encoding unit 520 encodes theinput downmixed signal with b_(M) bits by a predetermined encodingmethod to obtain and output the monaural code CM. That is, the b_(M)-bitmonaural code CM is obtained from the input downmixed signalX_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)} of T samples and is output.Any encoding method may be used, and for example, it is only required touse an encoding method such as the 3GPP EVS standard.

[Stereo Encoding Unit 530]

The first channel input sound signal and the second channel input soundsignal input to the encoding device 500 are input to the stereo encodingunit 530. The stereo encoding unit 530 encodes the first channel inputsound signal and the second channel input sound signal with b_(s) bitsin total by a predetermined encoding method to obtain and output thestereo code CS. That is, the stereo code CS of b_(S) bits in total isobtained from the first channel input sound signal X₁={x₁(1), x₁(2), . .. , x₁(T)} of the T samples and the second channel input sound signalX₂={x₂(1), x₂(2), . . . , x₂(T)} of the T samples and is output. Anymethod may be used as the encoding method, and for example, a stereoencoding method compatible with the stereo decoding method of the MPEG-4AAC standard may be used, or an encoding method for independentlyencoding each of the input first channel input sound signal and theinput second channel input sound signal may be used. Regardless of whichencoding method is used, it is only required to use a code obtained bycombining all codes obtained by encoding as the stereo code CS.

Since the monaural code CM is a code obtained by the monaural encodingunit 520 as described above and the stereo code CS is a code obtained bythe stereo encoding unit 530 as described above, the monaural code CMand the stereo code CS are different codes that do not includeoverlapping codes. That is, the monaural code CM is a code differentfrom the stereo code CS, and the stereo code CS is a code different fromthe monaural code CM.

<<Decoding Device 600>>

As illustrated in FIG. 32 , the decoding device 600 as an applicationdestination includes a monaural decoding unit 610 and a stereo decodingunit 620. The decoding device 600 decodes the input monaural code CM inunits of frames having the same time length as those of thecorresponding encoding device 500 to obtain and output a monauraldecoded sound signal that is a decoded sound signal in the monaural timedomain, and decodes the input stereo code CS to obtain and output afirst channel decoded sound signal and a second channel decoded soundsignal that are decoded sound signals in the two-channel stereo timedomain. In the decoding device 600, each unit described above performsthe following processing for each frame.

[Monaural Decoding Unit 610]

The monaural code CM input to the decoding device 600 is input to themonaural decoding unit 610. The monaural decoding unit 610 decodes themonaural code CM by a predetermined decoding method to obtain and outputthe monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)}. That is, the monaural decoding unit610 decodes the monaural code CM, which is a code different from thestereo code CS, without using information obtained by decoding thestereo code CS or the stereo code CS, to obtain the monaural decodedsound signal ^({circumflex over ( )})X_(M). As the predetermineddecoding method, a decoding method corresponding to the encoding methodused by the monaural encoding unit 520 of the corresponding encodingdevice 500 is used. The number of bits of the monaural code CM is b_(M).

[Stereo Decoding Unit 620]

The stereo code CS input to the decoding device 600 is input to thestereo decoding unit 620. The stereo decoding unit 620 decodes thestereo code CS by a predetermined decoding method to obtain and output afirst channel decoded sound signal^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)}that is a decoded sound signal of the left channel and a second channeldecoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)}that is a decoded sound signal of the right channel. That is, the stereodecoding unit 620 decodes the stereo code CS, which is a code differentfrom the monaural code CM, without using information obtained bydecoding the monaural code CM or the monaural code CM, to obtain thefirst channel decoded sound signal ^({circumflex over ( )})X₁ and thesecond channel decoded sound signal ^({circumflex over ( )})X₂. As thepredetermined decoding method, a decoding method corresponding to theencoding method used by the stereo encoding unit 530 of thecorresponding encoding device 500 is used. The total number of bits ofthe stereo code CS is b_(S).

Since the encoding device 500 and the decoding device 600 operate asdescribed above, the monaural code CM is a code derived from the samesound signal as the sound signal from which the stereo code CS isderived (that is, the first channel input sound signal X₁ and the secondchannel input sound signal X₂ input to the encoding device 500), but isa code different from the code from which the first channel decodedsound signal ^({circumflex over ( )})X₁ and the second channel decodedsound signal ^({circumflex over ( )})X₂ are obtained (that is, thestereo code CS).

First Embodiment

A sound signal purification device of a first embodiment improves adecoded sound signal of the each channel of the stereo by using amonaural decoded sound signal obtained from a code different from a codefrom which the decoded sound signal is obtained. Hereinafter, a soundsignal purification device of the first embodiment will be describedusing an example in a case where the number of channels of the stereo istwo.

<<Sound Signal Purification Device 1101>>

As illustrated in FIG. 1 , the sound signal purification device 1101 ofthe first embodiment includes a first channel purification weightestimation unit 1111-1, a first channel signal purification unit 1121-1,a second channel purification weight estimation unit 1111-2, and asecond channel signal purification unit 1121-2. The sound signalpurification device 1101 obtains and outputs, for the each channel ofthe stereo in units of frames having a predetermined time length of 20ms, for example, a purified decoded sound signal, which is a soundsignal obtained by improving the decoded sound signals of the channel,from the monaural decoded sound signal and the decoded sound signal ofthe channel. The decoded sound signals of the respective channels inputin units of frames to the sound signal purification device 1101 are, forexample, the first channel decoded sound signal^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)} ofthe T samples and the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)} ofthe T samples obtained by the stereo decoding unit 620 of the decodingdevice 600 described above decoding the b_(S)-bit stereo code CS that isa code different from the monaural code CM without using the informationobtained by decoding the monaural code CM or the monaural code CM. Themonaural decoded sound signal input in units of frames to the soundsignal purification device 1101 is, for example, the monaural decodedsound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} of the T samples obtained by themonaural decoding unit 610 of the decoding device 600 described abovedecoding the b_(M)-bit monaural code CM that is a code different fromthe stereo code CS without using the information obtained by decodingthe stereo code CS or the stereo code CS. The monaural code CM is a codederived from the same sound signal as the sound signal from which thestereo code CS is derived (that is, the first channel input sound signalX₁ and the second channel input sound signal X₂ input to the encodingdevice 500), but is a code different from the code from which the firstchannel decoded sound signal ^({circumflex over ( )})X₁ and the secondchannel decoded sound signal ^({circumflex over ( )})X₂ are obtained(that is, the stereo code CS). Assuming that the channel number n(channel index n) of the first channel is 1 and the channel number n ofthe second channel is 2, the sound signal purification device 1101performs steps S1111-n and S1121-n illustrated in FIG. 2 for the eachchannel for the each frame. That is, hereinafter, unless otherwisespecified, as each unit or step to which “−n” is attached, a unit orstep corresponding to the each channel exists, and specifically, eachunit or step for the first channel to which “−1” is attached instead of“−n” and each unit or step for the second channel to which “−2” isattached instead of “−n” are present. Similarly, in the followingdescription, unless otherwise specified, a suffix or the like with anotation of “n” indicates that there is one corresponding to eachchannel number, and specifically, there are one corresponding to thefirst channel to which “1” is added in place of “n” and onecorresponding to the second channel to which “2” is added in place of“n”.

[n-Th Channel Purification Weight Estimation Unit 1111-n]

An n-th channel purification weight estimation unit 1111-n obtains andoutputs an n-th channel purification weight α_(n) (step 1111-n). Then-th channel purification weight estimation unit 1111-n obtains the n-thchannel purification weight α_(n) by a method based on a principle ofminimizing a quantization error to be described later. The principle ofminimizing the quantization error and the method based on this principlewill be described later. The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2) . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1101 and the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} input to the sound signal purificationdevice 1101 are input to the n-th channel purification weight estimationunit 1111-n as necessary as indicated by a one-dot chain line in FIG. 1. The n-th channel purification weight α_(n) obtained by the n-thchannel purification weight estimation unit 1111-n is a value of 0 ormore and 1 or less. However, since the n-th channel purification weightestimation unit 1111-n obtains the n-th channel purification weightα_(n) for the each frame by the method to be described later, the n-thchannel purification weight α_(n) does not become zero or one in allframes. That is, there is a frame in which the n-th channel purificationweight α_(n) is a value larger than 0 and smaller than 1. In otherwords, in at least any one of all the frames, the n-th channelpurification weight α_(n) is a value larger than 0 and smaller than 1.

[n-Th Channel Signal Purification Unit 1121-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1101, the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} input to the sound signal purificationdevice 1101, and the n-th channel purification weight α_(n) output bythe n-th channel purification weight estimation unit 1111-n are input tothe n-th channel signal purification unit 1121-n. For each correspondingsample t, the n-th channel signal purification unit 1121-n obtains andoutputs a sequence based on a value ^(˜)x_(n)(t) obtained by adding avalue α_(n)×^({circumflex over ( )})x_(M)(t) obtained by multiplying then-th channel purification weight α_(n) by a sample value^({circumflex over ( )})x_(M)(t) of the monaural decoded sound signal^({circumflex over ( )})X_(M) and a value(1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained by multiplying avalue (1−α_(n)) obtained by subtracting the n-th channel purificationweight α_(n) from 1 by a sample value ^({circumflex over ( )})x_(n)(t)of the n-th channel decoded sound signal ^({circumflex over ( )})X_(n),as an n-th channel purified decoded sound signal^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} (stepS1121-n). That is,^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(M)(t).

[Principle of Minimizing Quantization Error]

Hereinafter, the principle of minimizing the quantization error will bedescribed. Depending on the encoding method/decoding method used by thestereo encoding unit 530 and the stereo decoding unit 620, the number ofbits used for encoding the input sound signal of the each channel maynot be determined positively, but in the following description, it isassumed that the number of bits used for encoding the input sound signalX_(n) of the n-th channel is b_(n).

The outline of the numbers of bits of the codes and the signals inprocesses of respective units of each device described above are asfollows. The stereo encoding unit 530 of the encoding device 500 towhich the sound signal purification device 1101 is applied encodes theinput sound signal X_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} of then-th channel to obtain a b_(n)-bit code. The monaural encoding unit 520of the encoding device 500 to which the sound signal purification device1101 is applied encodes the downmixed signal X_(M)={x_(M)(1), x_(M)(2),. . . , x_(M)(T)} to obtain a b_(M)-bit code. The stereo decoding unit620 of the decoding device 600 to which the sound signal purificationdevice 1101 is applied obtains the decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} of the n-th channel from the b_(n)-bitcode. The monaural decoding unit 610 of the decoding device 600 to whichthe sound signal purification device 1101 is applied obtains themonaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} from the b_(M)-bit code. For eachcorresponding sample t, the n-th channel signal purification unit 1121-nof the sound signal purification device 1101 obtains a sequence based ona value^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(n)×^({circumflex over ( )})x_(M)(t)obtained by multiplying the n-th channel purification weight α_(n) bythe sample value ^({circumflex over ( )})x_(M)(t) of the monauraldecoded sound signal ^({circumflex over ( )})X_(M) and a value(1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained by multiplying avalue (1−α_(n)) obtained by subtracting the n-th channel purificationweight α_(n) from 1 by the sample value ^({circumflex over ( )})x_(n)(t)of the n-th channel decoded sound signal ^({circumflex over ( )})X_(n),as the n-th channel purified decoded sound signal^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)}. The soundsignal purification device 1101 should be designed so that energy of aquantization error included in the n-th channel purified decoded soundsignal ^(˜)X_(n) obtained by the above processing is small.

In many cases, the energy of a quantization error included in a decodedsignal obtained by encoding or decoding an input signal (hereinafteralso referred to as a “quantization error caused by encoding” forconvenience) is roughly proportional to energy of the input signal, andtends to be exponentially smaller than the value of the number of bitsfor each sample used for encoding. Therefore, an average energy persample of the quantization error caused by encoding of the input soundsignal X_(n) of the n-th channel can be estimated as the followingExpression (1) using a positive number σ_(n) ². Further, an averageenergy per sample of the quantization error caused by encoding of thedownmixed signal X_(M) can be estimated as the following Expression (2)using a positive number σ_(M) ².

$\begin{matrix}\lbrack {{Math}.1} \rbrack &  \\{\sigma_{n}^{2}2^{- \frac{2b_{n}}{T}}} & (1)\end{matrix}$ $\begin{matrix}\lbrack {{Math}.2} \rbrack &  \\{\sigma_{M}^{2}2^{- \frac{2b_{M}}{T}}} & (2)\end{matrix}$

Here, it is assumed that the input sound signal X_(n)={x_(n)(1),x_(n)(2), . . . , x_(n)(T)} of the n-th channel and the downmixed signalX_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)} have respective samplevalues close enough to be regarded as the same sequence. For example, acase where the input sound signal X₁={x₁(1), x₁(2), . . . , x₁(T)} ofthe first channel and the input sound signal X₂={x₂(1), x₂(2), . . . ,x₂(T)} of the second channel are obtained by collecting a sound emittedby a sound source at an equal distance from the two microphones under anenvironment with little background noise or reverberation, or the likecorresponds to this condition. Since the energy of the signal includingthe value obtained by multiplying each sample value of the decoded soundsignal ^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} of the n-th channel by (1−α_(n)) canbe expressed by (1−α_(n))² times the energy of the downmixed signal,σ_(n) ² of Expression (1) can be replaced with (1−α)²×σ_(M) ² usingσ_(M) ² described above, and thus the average energy per sample of thequantization error included in the sequence{(1−α_(n))×^({circumflex over ( )})X_(n)(1),(1−α_(n))×^({circumflex over ( )})x_(n)(2), . . . ,(1−α_(n))×^({circumflex over ( )})x_(n)(T)} of the value obtained bymultiplying each sample value of the decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} of the n-th channel by (1−α_(n)) canbe estimated as the following Expression (3).

$\begin{matrix}\lbrack {{Math}.3} \rbrack &  \\{( {1 - \alpha_{n}} )^{2}\sigma_{M}^{2}2^{- \frac{2b_{n}}{T}}} & (3)\end{matrix}$

Further, the average energy per sample of the quantization errorincluded in the sequence of values {α_(n)×x_(M)(1), α_(n)×x_(M)(2), . .. , α_(n)×x_(M)(T)} obtained by multiplying each sample value of themonaural decoded sound signal ^({circumflex over ( )})X_(M) by an can beestimated as the following Expression (4).

$\begin{matrix}\lbrack {{Math}.4} \rbrack &  \\{\alpha_{n}^{2}\sigma_{M}^{2}2^{- \frac{2b_{M}}{T}}} & (4)\end{matrix}$

Assuming that the quantization error caused by encoding of the inputsound signal of the n-th channel and the quantization error caused byencoding of the downmixed signal have no correlation with each other,the average energy per sample of the quantization error included in then-th channel purified decoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1),^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} is estimated by the sum ofExpressions (3) and (4). The n-th channel purification weight α_(n) thatminimizes the energy of the quantization error included in the n-thchannel purified decoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1),^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} is obtained as the followingExpression (5).

$\begin{matrix}\lbrack {{Math}.5} \rbrack &  \\{\alpha_{n} = \frac{2^{- \frac{2b_{n}}{T}}}{2^{- \frac{2b_{n}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & (5)\end{matrix}$

That is, the n-th channel purification weight estimation unit 1111-n isonly required to obtain the n-th channel purification weight α_(n) byExpression (5) in order to minimize the quantization error included inthe n-th channel purified decoded sound signal under the condition thatthe input sound signal X_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} ofthe n-th channel and the downmixed signal X_(M)={x_(M)(1), x_(M)(2), . .. , x_(M)(T)} have respective sample values close enough to be regardedas the same sequence.

[Method Based on Principle of Minimizing Quantization Error]

Hereinafter, a specific example of a method for obtaining the n-thchannel purification weight α_(n) on the basis of the principle ofminimizing the quantization error described above will be described.

First Example

A first example is an example of obtaining the n-th channel purificationweight α_(n) by the principle of minimizing the quantization errordescribed above. The n-th channel purification weight estimation unit1111-n of the first example obtains the n-th channel purification weightα_(n) by Expression (5) using the number of samples T per frame, thenumber of bits b_(n) corresponding to the n-th channel in the number ofbits of the stereo code CS, and the number of bits b_(M) of the monauralcode CM. The method by which the n-th channel purification weightestimation unit 1111-n specifies the number of bits b_(n) and the numberof bits b_(M) is common to all the examples, and thus will be describedafter the seventh example which is the last specific example.

Second Example

A second example is an example of obtaining the n-th channelpurification weight α_(n) having a feature similar to the n-th channelpurification weight α_(n) obtained in the first example. The n-thchannel purification weight estimation unit 1111-n of the second exampleuses at least the number of bits b_(n) corresponding to the n-th channelin the number of bits of the stereo code CS and the number of bits b_(M)of the monaural code CM to obtain a value that is larger than 0 andsmaller than 1, 0.5 when b_(n) and b_(M) are equal, closer to 0 than 0.5as b_(n) is larger than b_(M), and closer to 1 than 0.5 as b_(M) islarger than b_(n) as the n-th channel purification weight α_(n).

Third Example

A third example is an example of obtaining the n-th channel purificationweight α_(n) in consideration of a case where the input sound signalX_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} of the n-th channel and thedownmixed signal X_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)} cannot beregarded as the same sequence. In a case where the input sound signalX_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} of the n-th channel and thedownmixed signal X_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)} do nothave respective sample values close enough to be regarded as the samesequence, the signal obtained by the weighted average(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(M)(t)has a waveform different from that of the input sound signalX_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} of the n-th channel even ina case where there is no quantization error. Therefore, in a case wherethere is no correlation at all between the input sound signalX_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} of the n-th channel and thedownmixed signal X_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)}, accuracycan be rather maintained by using the n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} without change as the n-th channelpurified decoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . .. , ^(˜)x_(n)(T)} without performing the weighted average processingdescribed above.

Therefore, in consideration of a case where the input sound signalX_(n)={x_(n)(1), x_(n)(2), . . . , x_(n)(T)} of the n-th channel and thedownmixed signal X_(M)={x_(M)(1), x_(M)(2), . . . , x_(M)(T)} cannot beregarded as the same sequence, the n-th channel signal purification unit1121-n is preferably capable of obtaining the n-th channel purifieddecoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . ,^(˜)x_(n)(T)} by the weighted average(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(M)(t)based on the n-th channel purification weight α_(n), which is closer tothe value obtained by the above Expression (5) as the correlation ishigher and closer to zero as the correlation is lower, according to thecorrelation between the n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} and the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)}. As the above correlation, forexample, a normalized inner product value r_(n) for the monaural decodedsound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(n)(1)^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} can be used as expressed by thefollowing Expression (6).

$\begin{matrix}\lbrack {{Math}.6} \rbrack &  \\{r_{n} = \frac{{\sum}_{t = 1}^{T}{{\hat{x}}_{n}(t)}{{\hat{x}}_{M}(t)}}{{\sum}_{t = 1}^{T}{{\hat{x}}_{M}(t)}{{\hat{x}}_{M}(t)}}} & (6)\end{matrix}$

Thus, the n-th channel purification weight estimation unit 1111-n of thethird example obtains the n-th channel purification weight α_(n) by thefollowing Expression (7) using the normalized inner product value r_(n)obtained by Expression (6).

$\begin{matrix}\lbrack {{Math}.7} \rbrack &  \\{\alpha_{n} = {\frac{2^{- \frac{2b_{n}}{T}}}{2^{- \frac{2b_{n}}{T}} + 2^{- \frac{2b_{M}}{T}}}r_{n}}} & (7)\end{matrix}$

For example, the n-th channel purification weight estimation unit 1111-nperforms steps S1111-1-n to S1111-3-n illustrated in FIG. 3 . The n-thchannel purification weight estimation unit 1111-n first obtains theinner product value r_(n) normalized by Expression (6) from the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and themonaural decoded sound signal ^({circumflex over ( )})X_(M) (stepS1111-1-n). The n-th channel purification weight estimation unit 1111-nalso obtains a correction coefficient c_(n) by the following Expression(8) from the number of samples T per frame, the number of bits b_(n)corresponding to the n-th channel in the number of bits of the stereocode CS, and the number of bits b_(M) of the monaural code CM (stepS1111-2-n).

$\begin{matrix}\lbrack {{Math}.8} \rbrack &  \\{c_{n} = \frac{2^{- \frac{2b_{n}}{T}}}{2^{- \frac{2b_{n}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & (8)\end{matrix}$

Next, the n-th channel purification weight estimation unit 1111-nobtains a value c_(n)×r_(n) obtained by multiplying the normalized innerproduct value r_(n) obtained in step S1111-1-n by the correctioncoefficient c_(n) obtained in step S1111-2-n as the n-th channelpurification weight α_(n) (step S1111-3-n). That is, the n-th channelpurification weight estimation unit 1111-n of the third example obtainsthe value c_(n)×r_(n) obtained by multiplying the correction coefficientc_(n) obtained by Expression (8) using the number of samples T perframe, the number of bits b_(n) corresponding to the n-th channel in thenumber of bits of the stereo code CS, and the number of bits b_(M) ofthe monaural code CM by the normalized inner product value r_(n) for themonaural decoded sound signal ^({circumflex over ( )})X_(M) of the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n), as the n-thchannel purification weight α_(n).

Fourth Example

A fourth example is an example of obtaining the n-th channelpurification weight α_(n) having a similar feature to the n-th channelpurification weight α_(n) obtained in the third example. The n-thchannel purification weight estimation unit 1111-n of the fourth exampleuses at least the n-th channel decoded sound signal^({circumflex over ( )})X_(n), the monaural decoded sound signal^({circumflex over ( )})X_(M), the number of bits b_(n) corresponding tothe n-th channel in the number of bits of the stereo code CS, and thenumber of bits b_(M) of the monaural code CM to obtain the valuec_(n)×r_(n) obtained by multiplying r_(n) that is a value of 0 or moreand 1 or less, closer to 1 as a correlation between the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) and the monauraldecoded sound signal ^({circumflex over ( )})X_(M) is higher, and closerto 0 as the correlation is lower by the correction coefficient c_(n)that is a value larger than 0 and smaller than 1, 0.5 when b_(n) andb_(M) are equal, closer to 0 than 0.5 as b_(n) is larger than b_(M), andcloser to 1 than 0.5 as b_(n) is smaller than b_(M), as the n-th channelpurification weight α_(n).

Fifth Example

A fifth example is an example in which, instead of the normalized innerproduct value of the third example, a value considering a value of inputof a past frame is used. In the fifth example, a rapid variation betweenframes of the n-th channel purification weight α_(n) is reduced, andnoise generated in the purified decoded sound signal due to thevariation is reduced. For example, as illustrated in FIG. 4 , the n-thchannel purification weight estimation unit 1111-n of the fifth exampleperforms the following steps S1111-11-n to S1111-13-n, and stepsS1111-2-n and S1111-3-n similar to those of the third example.

The n-th channel purification weight estimation unit 1111-n firstobtains an inner product value E_(n)(0) to be used in the current frameby the following Expression (9) using the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), ^({circumflex over ( )})x_(M)(T)}, themonaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)}, and the inner product value E_(n)(−1)that has been used in the previous frame (step S1111-11-n).

$\begin{matrix}\lbrack {{Math}.9} \rbrack &  \\{{E_{n}(0)} = {{\epsilon_{n}{E_{n}( {- 1} )}} + {\frac{( {1 - \epsilon_{n}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{x}}_{n}(t)}{{\hat{x}}_{M}(t)}}}}}} & (9)\end{matrix}$

Here, ε_(n) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the n-th channel purification weightestimation unit 1111-n. Note that the n-th channel purification weightestimation unit 1111-n stores the obtained inner product value E_(n)(0)in the n-th channel purification weight estimation unit 1111-n in orderto use this inner product value E_(n)(0) as the “inner product valueEn(−1) that has been used in the previous frame” in the next frame.

The n-th channel purification weight estimation unit 1111-n also obtainsenergy E_(M)(0) of the monaural decoded sound signal to be used in thecurrent frame by the following Expression (10) using the monauraldecoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} and energy E_(M)(−1) of the monauraldecoded sound signal that has been used in the previous frame (step1111-12-n).

$\begin{matrix}\lbrack {{Math}.10} \rbrack &  \\{{E_{M}(0)} = {{\epsilon_{M}{E_{M}( {- 1} )}} + {\frac{( {1 - \epsilon_{M}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{x}}_{M}(t)}{{\hat{x}}_{M}(t)}}}}}} & (10)\end{matrix}$

Here, ε_(M) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the n-th channel purification weightestimation unit 1111-n. Note that the n-th channel purification weightestimation unit 1111-n stores the obtained energy E_(M)(0) of themonaural decoded sound signal in the n-th channel purification weightestimation unit 1111-n in order to use this energy E_(M)(0) as the“energy EM(−1) of the monaural decoded sound signal that has been usedin the previous frame” in the next frame. Note that, since the values ofE_(M)(0) are the same in the first purification weight estimation unit1111-1 and the second purification weight estimation unit 1111-2,E_(M)(0) may be obtained by either the first purification weightestimation unit 1111-1 or the second purification weight estimation unit1111-2, and the obtained E_(M)(0) may be used by the other n-thpurification weight estimation unit 1111-n.

Next, the n-th channel purification weight estimation unit 1111-nobtains the normalized inner product value r_(n) by the followingExpression (11) using the inner product value E_(n)(0) to be used in thecurrent frame obtained in step S1111-11-n and the energy E_(M)(0) of themonaural decoded sound signal to be used in the current frame obtainedin step S1111-12-n (step S1111-13-n).

[Math. 11]

r _(n) =E _(n)(0)/E _(M)(0)  (11)

The n-th channel purification weight estimation unit 1111-n also obtainsthe correction coefficient c_(n) by Expression (8) (step S1111-2-n).Next, the n-th channel purification weight estimation unit 1111-nobtains the value c_(n)×r_(n) obtained by multiplying the normalizedinner product value r_(n) obtained in step S1111-13-n by the correctioncoefficient c_(n) obtained in step S1111-2-n as the n-th channelpurification weight α_(n) (step S1111-3-n).

That is, the n-th channel purification weight estimation unit 1111-n ofthe fifth example obtains the value c_(n)×r_(n) obtained by multiplyingthe normalized inner product value r_(n) obtained by Expression (11)using the inner product value E_(n)(0) obtained by Expression (9) usingeach sample value ^({circumflex over ( )})x_(n)(t) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n), each sample value^({circumflex over ( )})x_(M)(t) of the monaural decoded sound signal^({circumflex over ( )})X_(M), and the inner product value E_(n)(−1) ofthe previous frame, and the energy E_(M)(0) of the monaural decodedsound signal obtained by Expression (10) using each sample value^({circumflex over ( )})x_(M)(t) of the monaural decoded sound signal^({circumflex over ( )})X_(M) and the energy E_(M)(−1) of the monauraldecoded sound signal of the previous frame by the correction coefficientc_(n) obtained by Expression (8) using the number of samples T perframe, the number of bits b_(n) corresponding to the n-th channel in thenumber of bits of the stereo code CS, and the number of bits b_(M) ofthe monaural code CM, as the n-th channel purification weight an.

Note that, as ε_(n) and ε_(M) described above is closer to 1, thenormalized inner product value r_(n) is more likely to include theinfluence of the n-th channel decoded sound signal and the monauraldecoded sound signal of a past frame, and the normalized inner productvalue r_(n) and the variation between frames of the n-th channelpurification weight α_(n) obtained with the normalized inner productvalue r_(n) are small.

Sixth Example

For example, in a case where sound of voice, music, or the like includedin the first channel input sound signal is different from sound ofvoice, music, or the like included in the second channel input soundsignal, the monaural decoded sound signal includes both components ofthe first channel input sound signal and components of the secondchannel input sound signal. For this reason, there is a problem that, asa value used as the first channel purification weight α₁ is larger, asound derived from the input sound signal of the second channel thatshould not be originally heard is included in the first channel purifieddecoded sound signal. Similarly, there is a problem that, as a valueused as the second channel purification weight α₂ is larger, a soundderived from the input sound signal of the first channel that should notbe originally heard is included in the second channel purified decodedsound signal. Accordingly, in consideration of auditory quality, then-th channel purification weight estimation unit 1111-n of a sixthexample obtains a value smaller than the n-th channel purificationweight α_(n) of the each channel obtained by each example describedabove as the n-th channel purification weight α_(n). For example, then-th channel purification weight estimation unit 1111-n of the sixthexample based on the third example or the fifth example obtains a valueλ×c_(n)×r_(n) obtained by multiplying the normalized inner product valuer_(n) and the correction coefficient c_(n) described in the thirdexample or the normalized inner product value r_(n) and the correctioncoefficient c_(n) described in the fifth example by λ that is apredetermined value larger than 0 and smaller than 1, as the n-thchannel purification weight α_(n).

Seventh Example

The auditory quality problem described in the sixth example occurs whenthe correlation between the first channel input sound signal and thesecond channel input sound signal is small, and this problem is unlikelyto occur when the correlation between the first channel input soundsignal and the second channel input sound signal is large. Thus, then-th channel purification weight estimation unit 1111-n of a seventhexample uses the inter-channel correlation coefficient γ, which is acorrelation coefficient between the first channel decoded sound signaland the second channel decoded sound signal, instead of thepredetermined value of the sixth example, and gives priority to reducingthe energy of the quantization error included in the purified decodedsound signal as the correlation between the first channel decoded soundsignal and the second channel decoded sound signal is larger, and givespriority to suppressing deterioration of the auditory quality as thecorrelation between the first channel decoded sound signal and thesecond channel decoded sound signal is smaller. Hereinafter, differencesof the seventh example from the third and fifth examples will bedescribed.

[[[Inter-Channel Relationship Information Estimation Unit 1131 ofSeventh Example]]]

The sound signal purification device 1101 of the seventh example alsoincludes an inter-channel relationship information estimation unit 1131as indicated by a broken line in FIG. 1 . At least the first channeldecoded sound signal input to the sound signal purification device 1101and the second channel decoded sound signal input to the sound signalpurification device 1101 are input to the inter-channel relationshipinformation estimation unit 1131. The inter-channel relationshipinformation estimation unit 1131 of the seventh example obtains andoutputs the inter-channel correlation coefficient γ by using at leastthe first channel decoded sound signal and the second channel decodedsound signal (step S1131). The inter-channel correlation coefficient γis a correlation coefficient between the first channel decoded soundsignal and the second channel decoded sound signal, and may be acorrelation coefficient γ₀ between a sample sequence{^({circumflex over ( )})x₁(1), ^({circumflex over ( )})x₁(2), . . . ,^({circumflex over ( )})x₁(T)} of the first channel decoded sound signaland a sample sequence {^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)} ofthe second channel decoded sound signal, or may be a correlationcoefficient considering a time difference, for example, a correlationcoefficient γ_(τ) between a sample sequence of the first channel decodedsound signal and a sample sequence of the second channel decoded soundsignal at a position shifted backward from the sample sequence by tsamples. Note that the inter-channel relationship information estimationunit 1131 may obtain the inter-channel correlation coefficient γ by anyknown method or by a method described with the inter-channelrelationship information estimation unit 1132 of the second embodimentdescribed later. Note that, depending on the method of obtaining theinter-channel correlation coefficient γ, as indicated by a two-dot chainline in FIG. 1 , the monaural decoded sound signal input to the soundsignal purification device 1101 is also input to the inter-channelrelationship information estimation unit 1131.

This τ is information corresponding to a difference (what is called anarrival time difference) between an arrival time from a sound sourcemainly emitting a sound in a certain space to the microphone for thefirst channel and an arrival time from the sound source to themicrophone for the second channel when it is assumed that a sound signalobtained by performing AD conversion on a sound collected by themicrophone for the first channel arranged in the certain space is thefirst channel input sound signal X₁ and a sound signal obtained byperforming AD conversion on a sound collected by the microphone for thesecond channel arranged in the certain space is the second channel inputsound signal X₂. Hereinafter, this τ is referred to as an inter-channeltime difference. The inter-channel relationship information estimationunit 1131 may obtain the inter-channel time difference τ from the firstchannel decoded sound signal ^({circumflex over ( )})X₁ that is adecoded sound signal corresponding to the first channel input soundsignal X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ that is a decoded sound signal correspondingto the second channel input sound signal X₂ by any known method, and isonly required to obtain the inter-channel time difference τ by themethod described with the inter-channel relationship informationestimation unit 1132 of the second embodiment or the like. That is, thecorrelation coefficient γ_(τ) described above is informationcorresponding to a correlation coefficient between a sound signalobtained by reaching the microphone for the first channel from a soundsource and being collected and a sound signal obtained by reaching themicrophone for the second channel from the sound source and beingcollected.

[[[n-Th Channel Purification Weight Estimation Unit 1111-n of SeventhExample]]]

Instead of step S1111-3-n of the third example and the fifth example,the n-th channel purification weight estimation unit 1111-n of theseventh example obtains a value γ×c_(n)×r_(n) obtained by multiplyingthe normalized inner product value r_(n) obtained in step S1111-1-n ofthe third example or step SS1111-13-n of the fifth example, thecorrection coefficient c_(n) obtained in step S1111-2-n, and theinter-channel correlation coefficient γ obtained in step S1131 as then-th channel purification weight α_(n) (step S1111-3′-n). That is, then-th channel purification weight estimation unit 1111-n of the seventhexample obtains the value γ×c_(n)×r_(n) obtained by multiplying thenormalized inner product value r_(n) and the correction coefficientc_(n) described in the third example, or the normalized inner productvalue r_(n) and the correction coefficient c_(n) described in the fifthexample by the inter-channel correlation coefficient γ that is thecorrelation coefficient between the first channel decoded sound signaland the second channel decoded sound signal as the n-th channelpurification weight α_(n).

Note that, when obtaining the n-th channel purification weight α_(n) inthe third example to the seventh example, the n-th channel purificationweight estimation unit 1111-n may use a signal obtained by filtering foreach of the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the monaural decoded sound signal^({circumflex over ( )})X_(M) instead of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) and the monaural decoded soundsignal ^({circumflex over ( )})X_(M). The filter may be, for example, apredetermined low-pass filter or a linear prediction filter using alinear prediction coefficient obtained by analyzing the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) or the monauraldecoded sound signal ^({circumflex over ( )})X_(M). By performing thefiltering, it is possible to weight each frequency component of the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and themonaural decoded sound signal ^({circumflex over ( )})X_(M), and it ispossible to increase the contribution of an audibly important frequencycomponent when obtaining the n-th channel purification weight α_(n).

[Method for Specifying Number of Bits by of Monaural Code CM]

In a case where the number of bits b_(M) of the monaural code CM in thedecoding method used by the monaural decoding unit 610 is the same inall the frames (that is, in a case where the decoding method used by themonaural decoding unit 610 is a decoding method of a fixed bit rate), itis only required that the number of bits b_(M) of the monaural code CMis stored in a storage unit, which is not illustrated, in the n-thchannel purification weight estimation unit 1111-n. In a case where thenumber of bits b_(M) of the monaural code CM in the decoding method usedby the monaural decoding unit 610 is different depending on the frame(that is, in a case where the decoding method used by the monauraldecoding unit 610 is a decoding method of a variable bit rate), it isonly required that the monaural decoding unit 610 outputs the number ofbits b_(M) of the monaural code CM, and that the number of bits b_(M) isinput to the n-th channel purification weight estimation unit 1111-n.

[Method for Specifying Number of Bits b_(n) in Number of Bits of StereoCode CS]

In a case where the number of bits b_(n) corresponding to the n-thchannel in the number of bits of the stereo code CS in the decodingmethod used by the stereo decoding unit 620 is the same in all theframes, it is only required that the number of bits b_(n) correspondingto the n-th channel in the number of bits of the stereo code CS isstored in the storage unit, which is not illustrated, in the n-thchannel purification weight estimation unit 1111-n. In a case where thenumber of bits b_(n) corresponding to the n-th channel in the number ofbits of the stereo code CS in the decoding method used by the stereodecoding unit 620 is different depending on the frame, it is onlyrequired that the stereo decoding unit 620 outputs the number of bitsb_(n), and the number of bits b_(n) is input to the n-th channelpurification weight estimation unit 1111-n. In a case where the numberof bits b_(n) corresponding to the n-th channel in the number of bits ofthe stereo code CS in the decoding method used by the stereo decodingunit 620 is not determined positively, the n-th channel purificationweight estimation unit 1111-n is only required to use, for example, avalue obtained by the following first method or second method as b_(n).Note that, in both the first method and the second method, in a casewhere the number of bits b_(s) of the stereo code CS in the decodingmethod used by the stereo decoding unit 620 is the same in all theframes, it is only required that the number of bits b_(S) of the stereocode CS is stored in the storage unit, which is not illustrated, in then-th channel purification weight estimation unit 1111-n, and in a casewhere the number of bits b_(s) of the stereo code CS in the decodingmethod used by the stereo decoding unit 620 is different depending onthe frames, it is only required that the stereo decoding unit 620outputs the number of bits b_(S), and the number of bits b_(S) is inputto the n-th channel purification weight estimation unit 1111-n.

[[First Method for Specifying Number of Bits b_(n) in Number of Bits ofStereo Code CS]]

The n-th channel purification weight estimation unit 1111-n uses a value(that is, in a case of two-channel stereo, b_(s)/2 or one half of b_(s))obtained by dividing the number of bits b_(s) of the stereo code CS bythe number of channels as b_(n). That is, in a case where the number ofbits b_(s) of the stereo code CS in the decoding method used by thestereo decoding unit 620 is the same in all the frames, it is onlyrequired that a value obtained by dividing the number of bits b_(S) ofthe stereo code CS by the number of channels is stored as the number ofbits b_(n) in the storage unit, which is not illustrated, in the n-thchannel purification weight estimation unit 1111-n. In a case where thenumber of bits b_(s) of the stereo code CS in the decoding method usedby the stereo decoding unit 620 is different depending on the frame, itis only required that the n-th channel purification weight estimationunit 1111-n obtains a value obtained by dividing the number of bitsb_(s) by the number of channels as b_(n).

[[Second Method for Specifying Number of Bits b_(n) in Number of Bits ofStereo Code CS]]

The n-th channel purification weight estimation unit 1111-n obtains,using the decoded sound signals of all channels input to the soundsignal purification device 1101, a value obtained by adding a valueobtained by dividing the number of bits b_(s) of the stereo code CS bythe number of channels and a value proportional to a logarithmic valueof a ratio of the energy of the decoded sound signal^({circumflex over ( )})X_(n) of the n-th channel and a geometrical meanof the energy of the decoded sound signals of all the channels as b_(n).In general, in stereo encoding, compression can be efficiently performedby assigning the number of bits proportional to a logarithmic value ofenergy of each signal to the input sound signal of the each channel.Therefore, the second method is to estimate the number of bits b_(n) onthe assumption that the above-described number of bits is allocated inthe stereo code CS also in the encoding method used by the stereoencoding unit 530 and the decoding method used by the stereo decodingunit 620. More specifically, for example, the n-th channel purificationweight estimation unit 1111-n is only required to obtain the number ofbits b_(n) by the following Expression (12) using energy e₁ of the firstchannel decoded sound signal ^({circumflex over ( )})X₁ and energy e₂ ofthe second channel decoded sound signal ^({circumflex over ( )})X₂.

[Math. 12] $\begin{matrix}{b_{n} = {\frac{b_{s}}{2} + {\frac{1}{2}\log_{2}\frac{e_{n}}{\sqrt{e_{1}e_{2}}}}}} & (12)\end{matrix}$

Modification Example of First Embodiment

Even in a case where the sound signal purification device 1101 uses theinter-channel correlation coefficient γ, in a case where the stereodecoding unit 620 of the decoding device 600 obtains the inter-channelcorrelation coefficient γ, the sound signal purification device 1101 maynot include the inter-channel relationship information estimation unit1131, and the inter-channel correlation coefficient γ obtained by thestereo decoding unit 620 of the decoding device 600 may be input to thesound signal purification device 1101, so that the sound signalpurification device 1101 uses the input inter-channel correlationcoefficient γ.

In addition, even in a case where the sound signal purification device1101 uses the inter-channel correlation coefficient γ, when aninter-channel relationship information code CC obtained and output by aninter-channel relationship information encoding unit, which is notillustrated, included in the encoding device 500 described aboveincludes a code representing the inter-channel correlation coefficientγ, the sound signal purification device 1101 may not include theinter-channel relationship information estimation unit 1131, the coderepresenting the inter-channel correlation coefficient γ included in theinter-channel relationship information code CC may be input to the soundsignal purification device 1101, the sound signal purification device1101 may include an inter-channel relationship information decodingunit, which is not illustrated, and the inter-channel relationshipinformation decoding unit may decode the code representing theinter-channel correlation coefficient γ to obtain and output theinter-channel correlation coefficient γ.

Second Embodiment

Similarly to the sound signal purification device of the firstembodiment, a sound signal purification device of a second embodimentalso improves the decoded sound signal of the each channel of the stereoby using a monaural decoded sound signal obtained from a code differentfrom the code from which the decoded sound signal is obtained. The soundsignal purification device of the second embodiment is different fromthe sound signal purification device of the first embodiment in that asignal obtained by upmixing the monaural decoded sound signal for theeach channel is used instead of the monaural decoded sound signalitself. Hereinafter, regarding the sound signal purification device ofthe second embodiment, differences from the sound signal purificationdevice of the first embodiment will be mainly described using an examplein a case where the number of channels of the stereo is two.

<<Sound Signal Purification Device 1102>>

As illustrated in FIG. 5 , the sound signal purification device 1102 ofthe second embodiment includes the inter-channel relationshipinformation estimation unit 1132, a monaural decoded sound upmixing unit1172, a first channel purification weight estimation unit 1112-1, afirst channel signal purification unit 1122-1, a second channelpurification weight estimation unit 1112-2, and a second channel signalpurification unit 1122-2. For the each frame, as illustrated in FIG. 6 ,the sound signal purification device 1102 performs steps S1132 andS1172, and steps S1112-n and S1122-n for the each channel.

[Inter-Channel Relationship Information Estimation Unit 1132]

At least the first channel decoded sound signal^({circumflex over ( )})X₁ input to the sound signal purification device1102 and the second channel decoded sound signal^({circumflex over ( )})X₂ input to the sound signal purification device1102 are input to the inter-channel relationship information estimationunit 1132. The inter-channel relationship information estimation unit1132 obtains and outputs inter-channel relationship information by usingat least the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ (step S1132). The inter-channel relationshipinformation is information indicating a relationship between thechannels of the stereo. Examples of the inter-channel relationshipinformation are an inter-channel time difference τ and an inter-channelcorrelation coefficient γ. The inter-channel relationship informationestimation unit 1132 may obtain a plurality of types of inter-channelrelationship information and, for example, may obtain the inter-channeltime difference τ and the inter-channel correlation coefficient γ.

The inter-channel time difference τ is information corresponding to adifference (what is called an arrival time difference) between anarrival time from a sound source mainly emitting a sound in a certainspace to the microphone for the first channel and an arrival time fromthe sound source to the microphone for the second channel when it isassumed that a sound signal obtained by performing AD conversion on asound collected by the microphone for the first channel arranged in thecertain space is the first channel input sound signal X₁ and a soundsignal obtained by performing AD conversion on a sound collected by themicrophone for the second channel arranged in the certain space is thesecond channel input sound signal X₂. Note that, in order to include notonly the arrival time difference but also information corresponding towhich microphone is reached earlier in the inter-channel time differenceτ, it is assumed that the inter-channel time difference τ can take apositive value or a negative value with any one of the sound signals asa reference. The inter-channel relationship information estimation unit1132 obtains the inter-channel time difference τ from the first channeldecoded sound signal ^({circumflex over ( )})X₁ that is a decoded soundsignal corresponding to the first channel input sound signal X₁ and thesecond channel decoded sound signal ^({circumflex over ( )})X₂ that is adecoded sound signal corresponding to the second channel input soundsignal X₂. That is, the inter-channel time difference τ obtained by theinter-channel relationship information estimation unit 1132 isinformation indicating how far ahead the same sound signal is includedin the first channel decoded sound signal ^({circumflex over ( )})X₁ orthe second channel decoded sound signal ^({circumflex over ( )})X₂.Hereinafter, in a case where the same sound signal is included in thefirst channel decoded sound signal ^({circumflex over ( )})X₁ earlierthan the second channel decoded sound signal ^({circumflex over ( )})X₂,the first channel is also described as preceding, and in a case wherethe same sound signal is included earlier in the second channel decodedsound signal ^({circumflex over ( )})X₂ than in the first channeldecoded sound signal ^({circumflex over ( )})X₁, the second channel isalso referred to as preceding.

The inter-channel relationship information estimation unit 1132 mayobtain the inter-channel time difference τ by any known method. Forexample, the inter-channel relationship information estimation unit 1132calculates a value (hereinafter, referred to as a correlation value)Y_(can)a representing the magnitude of a correlation between the samplesequence of the first channel decoded sound signal^({circumflex over ( )})X₁ and the sample sequence of the second channeldecoded sound signal ^({circumflex over ( )})X₂ at a position shiftedbackward from the sample sequence by the number of possible samplesτ_(cand) for each number of possible samples τ_(cand) from τ_(max) toτ_(min) determined in advance (for example, τ_(max) is a positivenumber, and τ_(min) is a negative number), and obtains the number ofpossible samples τ_(cand) with which the correlation value γ_(cand) ismaximized as the inter-channel time difference τ. That is, in thisexample, the inter-channel time difference τ is a positive value in acase where the first channel is preceding, and the inter-channel timedifference τ is a negative value when the second channel is preceding.That is, the absolute value |τ| of the inter-channel time difference τis the number of samples |τ| corresponding to the time differencebetween the first channel and the second channel, and is a value (thenumber of preceding samples) indicating how much the preceding channelis preceding the other channel. Further, whether the inter-channel timedifference τ is a positive value or a negative value is informationindicating which channel of the first channel and the second channel ispreceding. Therefore, the inter-channel relationship informationestimation unit 1132 may obtain information indicating the number ofsamples |τ| corresponding to the time difference between the firstchannel and the second channel and information indicating which channelof the first channel and the second channel is preceding, instead of theinter-channel time difference τ.

For example, in a case where the inter-channel relationship informationestimation unit 1132 calculates the correlation value γ_(cand) usingonly the samples in the frame, in a case where τ_(cand) is a positivevalue, it is only required to calculate, as the correlation valueγ_(cand), an absolute value of a correlation coefficient between apartial sample sequence {^({circumflex over ( )})x₂(1+τ_(cand)),^({circumflex over ( )})x₂(2+τ_(cand)), . . . ,^({circumflex over ( )})X₂(T)} of the second channel decoded soundsignal ^({circumflex over ( )})X₂ and a partial sample sequence{^({circumflex over ( )})x₁(1), ^({circumflex over ( )})x₁(2), . . . ,^({circumflex over ( )})x₁(T−τ_(cand))} of the first channel decodedsound signal ^({circumflex over ( )})X₁ at a position shifted forwardfrom the partial sample sequence by the number of possible samplesτ_(cand), and in a case where τ_(cand) is a negative value, it is onlyrequired to calculate, as the correlation value γ_(cand), an absolutevalue of a correlation coefficient between a partial sample sequence{^({circumflex over ( )})x₁(1−τ_(cand)),^({circumflex over ( )})x₁(2−τ_(cand)), . . . ,^({circumflex over ( )})x₁(T)} of the first channel decoded sound signal^({circumflex over ( )})X₁ and a partial sample sequence{^({circumflex over ( )})x₂(1), ^({circumflex over ( )})x₂(2), . . . ,^({circumflex over ( )})x₂(T+τ_(cand))} of the second channel decodedsound signal ^({circumflex over ( )})X₂ at a position shifted forwardfrom the partial sample sequence by the number of possible samples(−τ_(cand)). Of course, one or more samples of the past decoded soundsignals continuous with the sample sequence of the decoded sound signalof the current frame may also be used in order to calculate thecorrelation value γ_(cand), and in this case, the inter-channelrelationship information estimation unit 1132 is only required to storethe sample sequence of the decoded sound signal of a past frame for apredetermined number of frames in the storage unit, which is notillustrated, in the inter-channel relationship information estimationunit 1132.

Furthermore, for example, instead of the absolute value of thecorrelation coefficient, the correlation value γ_(cand) may becalculated using the phase information of the signal as follows. In thisexample, the inter-channel relationship information estimation unit 1132first performs Fourier transform on the first channel decoded soundsignal ^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)} asthe following Expression (21), to thereby obtain a frequency spectrumf₁(k) at each frequency k from zero to T−1.

[Math. 13] $\begin{matrix}{{f_{1}(k)} = {\frac{1}{\sqrt{T}}{\sum\limits_{t = 0}^{T - 1}{{{\overset{\hat{}}{x}}_{1}( {t + 1} )}e^{{- j}\frac{2\pi{kt}}{T}}}}}} & (21)\end{matrix}$

The inter-channel relationship information estimation unit 1132 alsoperforms Fourier transform on the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)} asthe following Expression (22), to thereby obtain a frequency spectrumf₂(k) at each frequency k from zero to T−1.

[Math. 14] $\begin{matrix}{{f_{2}(k)} = {\frac{1}{\sqrt{T}}{\sum\limits_{t = 0}^{T - 1}{{{\overset{\hat{}}{x}}_{2}( {t + 1} )}e^{{- j}\frac{2\pi{kt}}{T}}}}}} & (22)\end{matrix}$

Next, the inter-channel relationship information estimation unit 1132obtains the spectrum φ(k) of the phase difference at each frequency k bythe following Expression (23) using the frequency spectra f₁(k) andf₂(k) of each frequency k from zero to T−1.

[Math. 15] $\begin{matrix}{{\phi(k)} = \frac{{f_{1}(k)}/{❘{f_{1}(k)}❘}}{{f_{2}(k)}/{❘{f_{2}(k)}❘}}} & (23)\end{matrix}$

Next, the inter-channel relationship information estimation unit 1132performs inverse Fourier transform on the spectrum of the phasedifference from zero to T−1, to thereby obtain a phase difference signalψ(τ_(cand)) for each number of possible samples τ_(cand) from τ_(max) toτ_(min) as the following Expression (24).

[Math. 16] $\begin{matrix}{{\psi( \tau_{c\alpha nd} )} = {\frac{1}{\sqrt{T}}{\sum\limits_{k = 0}^{T - 1}{{\phi(k)}e^{j\frac{2\pi k\tau_{cand}}{T}}}}}} & (24)\end{matrix}$

The absolute value of the phase difference signal ψ(τ_(cand)) obtainedhere represents a kind of correlation corresponding to the likelihood ofthe time difference between the first channel decoded sound signal^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)}and the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)}.Accordingly, next, the inter-channel relationship information estimationunit 1132 obtains an absolute value of the phase difference signalψ(τ_(cand)) with respect to each number of possible samples τ_(cand) asa correlation value γ_(cand). Next, the inter-channel relationshipinformation estimation unit 1132 obtains the number of possible samplesτ_(cand) with which the correlation value γ_(cand), which is theabsolute value of the phase difference signal ψ(τ_(can)), is maximizedas the inter-channel time difference τ.

Note that, instead of using the absolute value of the phase differencesignal ψ(τ_(cand)) without change as the correlation value γ_(cand), theinter-channel relationship information estimation unit 1132 may use anormalized value such as a relative difference of the average ofabsolute values of the phase difference signals obtained respectivelyfor the plurality of the numbers of possible samples, for example,before and after τ_(cand) with respect to the absolute value of thephase difference signal ψ(τ_(cand)) for each τ_(cand). Specifically, theinter-channel relationship information estimation unit 1132 may obtainan average value by the following Expression (25) for each τ_(cand) byusing a predetermined positive number τ_(range), and obtain a normalizedcorrelation value obtained by the following Expression (26) using theobtained average value ψ_(c)(τ_(cand)) and the phase difference signalψ(τ_(cand)) as γ_(cand).

[Math. 17] ψ c ( τ cand ) = 1 2 ⁢ τ range + 1 ⁢ ∑ τ = τ cand - τ range τcand + τ range ❘ "\[LeftBracketingBar]" ψ ⁡ ( τ ′ ) ❘"\[RightBracketingBar]" ( 25 ) [Math. 18] $\begin{matrix}{1 - \frac{\psi_{c}( \tau_{cand} )}{❘{\psi( \tau_{cand} )}❘}} & (26)\end{matrix}$

Note that the normalized correlation value obtained by Expression (26)is a value of 0 or more and 1 or less, and is a value having propertiesof being close to one as τ_(cand) is likely to be the inter-channel timedifference, and being close to zero as τ_(cand) is not likely to be theinter-channel time difference.

Each number of possible samples determined in advance may be eachinteger value from τ_(max) to τ_(min), may include a fractional value ora decimal value between τ_(max) and τ_(min), and may not include anyinteger value between τ_(max) and τ_(min). In addition, τ_(max)=−τ_(min)may be satisfied or may not be satisfied. In addition, in a case where aspecial decoded sound signal in which one of the channels is alwayspreceding is targeted, τ_(max) and τ_(min) may be positive numbers, orτ_(max) and τ_(min) may be negative numbers.

Note that, in a case where the sound signal purification device 1102obtains the n-th channel purification weight α_(n) in the seventhexample described in the first embodiment, the inter-channelrelationship information estimation unit 1132 further outputs a maximumvalue among correlation values between the sample sequence of the firstchannel decoded sound signal and the sample sequence of the secondchannel decoded sound signal at a position shifted backward from thesample sequence by the inter-channel time difference τ, that is,correlation values γ_(cand) calculated for each number of possiblesamples τ_(cand) from τ_(max) to τ_(min), as the inter-channelcorrelation coefficient γ.

Further, for example, the inter-channel relationship informationestimation unit 1132 may obtain the inter-channel correlationcoefficient γ by also using the monaural decoded sound signal. In thiscase, as indicated by a two-dot chain line in FIG. 5 , the monauraldecoded sound signal input to the sound signal purification device 1102is also input to the inter-channel relationship information estimationunit 1132. The inter-channel relationship information estimation unit1132 may use the first channel decoded sound signal^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)},the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)},and the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} to obtain a most appropriate weightwhen it is assumed that the monaural decoded sound signal^({circumflex over ( )})X_(M) is approximated by the weighted sum of thefirst channel decoded sound signal ^({circumflex over ( )})X₁ and thesecond channel decoded sound signal ^({circumflex over ( )})X₂ as theinter-channel correlation coefficient γ. That is, the inter-channelrelationship information estimation unit 1132 may obtain a weightw_(cand) having a minimum value obtained by the following Expression(27) among w_(cand) of −1 or more and 1 or less, as the inter-channelcorrelation coefficient γ.

[Math. 19] $\begin{matrix}{\sum\limits_{i = 1}^{T}{❘{( {{\frac{1 + w_{cand}}{2}{{\hat{x}}_{1}(t)}} + {\frac{1 - w_{cand}}{2}{{\hat{x}}_{2}(t)}}} ) - {{\overset{\hat{}}{x}}_{M}(t)}}❘}^{2}} & (27)\end{matrix}$

In a case where the correlation between the channels is high, that is,in a case where the first channel input sound signal input to theencoding device 500 and the second channel input sound signal input tothe encoding device 500 have similar waveforms when the time differencesare combined, assuming that downmixing is efficiently performed in thedownmixing unit 510 of the encoding device 500, the monaural decodedsound signal includes many signals that are temporally synchronized withthe decoded sound signal of the preceding channel out of the firstchannel decoded sound signal and the second channel decoded soundsignal. Therefore, the inter-channel correlation coefficient γ obtainedby Expression (27) is a value close to one in a case where the soundsignal included in the first channel decoded sound signal is preceding,and is a value close to −1 in a case where the sound signal included inthe second channel decoded sound signal is preceding, and the absolutevalue decreases as the correlation between the channels decreases.Therefore, the weight w_(cand) with which the value obtained byExpression (27) is the smallest can be used as the inter-channelcorrelation coefficient γ. Note that, in this method, the inter-channelrelationship information estimation unit 1132 can obtain theinter-channel correlation coefficient γ without obtaining theinter-channel time difference τ·

[Monaural Decoded Sound Upmixing Unit 1172]

The monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} input to the sound signal purificationdevice 1102 and the inter-channel relationship information output by theinter-channel relationship information estimation unit 1132 are input tothe monaural decoded sound upmixing unit 1172. The monaural decodedsound upmixing unit 1172 performs an upmixing process using the monauraldecoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} and the inter-channel relationshipinformation, to thereby obtain and output an n-th channel upmixedmonaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} that is a signal obtained by upmixingthe monaural decoded sound signal for the each channel (step S1172). Theinter-channel relationship information used by the monaural decodedsound upmixing unit 1172 is information indicating a relationshipbetween the channels of the stereo, and may be one type or a pluralityof types. The monaural decoded sound upmixing unit 1172 is only requiredto perform the upmixing process using, for example, informationindicating the inter-channel time difference τ or the number of samples|τ| corresponding to the time difference between the first channel andthe second channel and information indicating which channel of the firstchannel and the second channel is preceding as follows.

[[Example of Upmixing Process Using Inter-Channel Time Difference τ]]

In a case where the first channel is preceding (that is, in a case wherethe inter-channel time difference τ is a positive value, or in a casewhere the information indicating which channel of the first channel andthe second channel is preceding indicates that the first channel ispreceding), the monaural decoded sound upmixing unit 1172 outputs themonaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} without change as the first channelupmixed monaural decoded sound signal^({circumflex over ( )})X_(M1)={^({circumflex over ( )})x_(M1)(1),^({circumflex over ( )})x_(M1)(2), . . . ,^({circumflex over ( )})x_(M1)(T)}, and outputs a signal{^({circumflex over ( )})x_(M)(1−|τ|),^({circumflex over ( )})x_(M)(2−|τ|), . . . ,^({circumflex over ( )})x_(M)(T−|τ|)} obtained by delaying the monauraldecoded sound signal by |τ| samples (the number of samples correspondingto the absolute value of the inter-channel time difference τ and thenumber of samples corresponding to the magnitude represented by theinter-channel time difference τ) as the second channel upmixed monauraldecoded sound signal^({circumflex over ( )})X_(M2)={^({circumflex over ( )})X_(M2)(1),^({circumflex over ( )})x_(M2)(2), . . . ,^({circumflex over ( )})x_(M2)(T)}. In a case where the second channelis preceding (that is, in a case where the inter-channel time differenceτ is a negative value, or in a case where the information indicatingwhich channel of the first channel and the second channel is precedingindicates that the second channel is preceding), the monaural decodedsound upmixing unit 1172 outputs a signal{^({circumflex over ( )})x_(M)(1−|τ|),^({circumflex over ( )})x_(M)(2−|τ|), . . . ,^({circumflex over ( )})x_(M)(T−|τ|)} obtained by delaying the monauraldecoded sound signal by |τ| samples as the first channel upmixedmonaural decoded sound signal^({circumflex over ( )})X_(M1)={^({circumflex over ( )})x_(M1)(1),^({circumflex over ( )})x_(M1)(2), . . . ,^({circumflex over ( )})x_(M1)(T)}, and outputs the monaural decodedsound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} without change as the second channelupmixed monaural decoded sound signal^({circumflex over ( )})X_(M2)={^({circumflex over ( )})X_(M2)(1),^({circumflex over ( )})X_(M2)(2), . . . ,^({circumflex over ( )})x_(M2)(T)}. In a case where no channel ispreceding (that is, in a case where the inter-channel time difference τis zero, or in a case where the information indicating which channel ofthe first channel and the second channel is preceding indicates thatnone of the channels is preceding), the monaural decoded sound upmixingunit 1172 outputs the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} without change as the first channelupmixed monaural decoded sound signal^({circumflex over ( )})X_(M1)={^({circumflex over ( )})x_(M1)(1),^({circumflex over ( )})x_(M1)(2), . . . ,^({circumflex over ( )})x_(M1)(T)} and the second channel upmixedmonaural decoded sound signal^({circumflex over ( )})X_(M2)={^({circumflex over ( )})x_(M2)(1),^({circumflex over ( )})x₂(2), . . . ,^({circumflex over ( )})x_(M2)(T)}. That is, the monaural decoded soundupmixing unit 1172 outputs, for a channel in which the above-describedarrival time is shorter out of the first channel and the second channel,the input monaural decoded sound signal without change as the upmixedmonaural decoded sound signal of the channel, and outputs, for a channelin which the above-described arrival time is longer out of the firstchannel and the second channel, a signal obtained by delaying the inputmonaural decoded sound signal by the absolute value |τ| of theinter-channel time difference τ as the upmixed monaural decoded soundsignal of the channel. Note that, since the monaural decoded soundsignal of a past frame is used in the monaural decoded sound upmixingunit 1172 to obtain a signal obtained by delaying the monaural decodedsound signal, the monaural decoded sound signal input in the past frameis stored for a predetermined number of frames in the storage unit,which is not illustrated, in the monaural decoded sound upmixing unit1172.

[n-Th Channel Purification Weight Estimation Unit 1112-n]

The n-th channel purification weight estimation unit 1112-n obtains andoutputs the n-th channel purification weight α_(n) (step S1112-n). Then-th channel purification weight estimation unit 1112-n obtains the n-thchannel purification weight α_(n) by a method similar to the methodbased on the principle of minimizing the quantization error described inthe first embodiment. The n-th channel purification weight α_(n)obtained by the n-th channel purification weight estimation unit 1112-nis a value of 0 or more and 1 or less. However, since the n-th channelpurification weight estimation unit 1112-n obtains the n-th channelpurification weight α_(n) for the each frame by the method to bedescribed later, the n-th channel purification weight α_(n) does notbecome zero or one in all the frames. That is, there is a frame in whichthe n-th channel purification weight α_(n) is a value larger than 0 andsmaller than 1. In other words, in at least any one of all the frames,the n-th channel purification weight α_(n) is a value larger than 0 andsmaller than 1.

Specifically, as in the following first to seventh examples, the n-thchannel purification weight estimation unit 1112-n obtains the n-thchannel purification weight α_(n) using the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) instead ofthe monaural decoded sound signal ^({circumflex over ( )})X_(M) at aposition where the monaural decoded sound signal^({circumflex over ( )})X_(M) is used in the method based on theprinciple of minimizing the quantization error described in the firstembodiment. As a matter of course, the n-th channel purification weightestimation unit 1112-n uses the value obtained on the basis of the n-thchannel upmixed monaural decoded sound signal^({circumflex over ( )})X_(M) instead of the value obtained on the basisof the monaural decoded sound signal ^({circumflex over ( )})X_(M) at aposition where the value obtained on the basis of the monaural decodedsound signal ^({circumflex over ( )})X_(M) is used in the method basedon the principle of minimizing the quantization error described in thefirst embodiment. For example, the n-th channel purification weightestimation unit 1112-n uses the energy E_(Mn)(0) of the n-th channelupmixed monaural decoded sound signal of the current frame instead ofthe energy E_(M)(0) of the monaural decoded sound signal of the currentframe, and uses the energy E_(Mn)(−1) of the n-th channel upmixedmonaural decoded sound signal of the previous frame instead of theenergy E_(M)(−1) of the monaural decoded sound signal of the previousframe.

First Example

The n-th channel purification weight estimation unit 1112-n of the firstexample obtains the n-th channel purification weight α_(n) by thefollowing Expression (2-5) using the number of samples T per frame, thenumber of bits b_(n) corresponding to the n-th channel in the number ofbits of the stereo code CS, and the number of bits b_(M) of the monauralcode CM.

[Math. 20] $\begin{matrix}{\alpha_{n} = \frac{2^{- \frac{2b_{n}}{T}}}{2^{- \frac{2b_{n}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & ( {2 - 5} )\end{matrix}$

Second Example

The n-th channel purification weight estimation unit 1112-n of thesecond example uses at least the number of bits b_(n) corresponding tothe n-th channel in the number of bits of the stereo code CS and thenumber of bits b_(M) of the monaural code CM to obtain a value that islarger than 0 and smaller than 1, 0.5 when b_(n) and b_(M) are equal,closer to 0 than 0.5 as b_(n) is larger than b_(M), and closer to 1 than0.5 as b_(M) is larger than b_(n) as the n-th channel purificationweight α_(n).

Third Example

The n-th channel purification weight estimation unit 1112-n of the thirdexample obtains a value c_(n)×r_(n) obtained by multiplying a correctioncoefficient c_(n) obtained by

[Math. 21] $\begin{matrix}{c_{n} = \frac{2^{- \frac{2b_{n}}{T}}}{2^{- \frac{2b_{n}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & ( {2 - 8} )\end{matrix}$

using the number of samples T per frame, the number of bits b_(n)corresponding to the n-th channel in the number of bits of the stereocode CS, and the number of bits b_(M) of the monaural code CM, and

the normalized inner product value r_(n) for the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) of the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) as the n-thchannel purification weight α_(n).

The n-th channel purification weight estimation unit 1112-n of the thirdexample obtains the n-th channel purification weight α_(n), for example,by performing the following steps S1112-31-n to S1112-33-n. The n-thchannel purification weight estimation unit 1112-n first obtains thenormalized inner product value r_(n) for the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) of the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) by thefollowing Expression (2-6) from the n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})X_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} and the n-th channel upmixed monauraldecoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} (step S1112-31-n).

[Math. 22] $\begin{matrix}{r_{n} = \frac{{\sum}_{t = 1}^{T}{{\hat{x}}_{n}(t)}{{\hat{x}}_{Mn}(t)}}{{\sum}_{t = 1}^{T}{{\hat{x}}_{Mn}(t)}{{\hat{x}}_{Mn}(t)}}} & ( {2 - 6} )\end{matrix}$

The n-th channel purification weight estimation unit 1112-n also obtainsthe correction coefficient c_(n) by Expression (2-8) using the number ofsamples T per frame, the number of bits b_(n) corresponding to the n-thchannel in the number of bits of the stereo code CS, and the number ofbits b_(M) of the monaural code CM (step S1112-32-n). Next, the n-thchannel purification weight estimation unit 1112-n obtains the valuec_(n)×r_(n) obtained by multiplying the normalized inner product valuer_(n) obtained in step S1112-31-n by the correction coefficient c_(n)obtained in step S1112-32-n as the n-th channel purification weightα_(n) (step S1112-33-n).

Fourth Example

The n-th channel purification weight estimation unit 1112-n of thefourth example uses the number of bits corresponding to the n-th channelin the number of bits of the stereo code CS as b_(n) and the number ofbits of the monaural code CM as b_(M) to obtain the value c_(n)×r_(n)obtained by multiplying r_(n) that is a value of 0 or more and 1 orless, closer to 1 as the correlation between the n-th channel decodedsound signal ^({circumflex over ( )})X_(n) and the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) is higher,and closer to 0 as the correlation is lower by the correctioncoefficient c_(n) that is a value larger than 0 and smaller than 1, 0.5when b_(n) and b_(m) are equal, closer to 0 than 0.5 as b_(n) is largerthan b_(M), and closer to 1 than 0.5 as b_(n) is smaller than b_(M), asthe n-th channel purification weight α_(n).

Fifth Example

The n-th channel purification weight estimation unit 1112-n of the fifthexample obtains the n-th channel purification weight α_(n) by, forexample, performing the following steps S1112-51-n to S1112-55-n.

The n-th channel purification weight estimation unit 1112-n firstobtains the inner product value E_(n)(0) to be used in the current frameby the following Expression (2-9) using the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)}, the n-th channel upmixed monauraldecoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)}, and the inner product valueE_(n)(−1) that has been used in the previous frame (step S1112-51-n).

[Math. 23] $\begin{matrix}{{E_{n}(0)} = {{\epsilon_{n}{E_{n}( {- 1} )}} + {\frac{( {1 - \epsilon_{n}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{x}}_{n}(t)}{{\hat{x}}_{Mn}(t)}}}}}} & ( {2 - 9} )\end{matrix}$

Here, ε_(n) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the n-th channel purification weightestimation unit 1112-n. Note that the n-th channel purification weightestimation unit 1112-n stores the obtained inner product value E_(n)(0)in the n-th channel purification weight estimation unit 1112-n in orderto use this inner product value E_(n)(0) as the “inner product valueE_(n)(−1) that has been used in the previous frame” in the next frame.

The n-th channel purification weight estimation unit 1112-n also obtainsthe energy E_(Mn)(0) of the n-th channel upmixed monaural decoded soundsignal to be used in the current frame by the following Expression(2-10) using the n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} and the energy E_(Mn)(−1) of the n-thchannel upmixed monaural decoded sound signal that has been used in theprevious frame (step S1112-52-n).

[Math. 24] $\begin{matrix}{{E_{Mn}(0)} = {{\epsilon_{Mn}{E_{Mn}( {- 1} )}} + {\frac{( {1 - \epsilon_{Mn}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{x}}_{Mn}(t)}{{\hat{x}}_{Mn}(t)}}}}}} & ( {2 - 10} )\end{matrix}$

Here, ε_(Mn) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the n-th channel purification weightestimation unit 1112-n. Note that the n-th channel purification weightestimation unit 1112-n stores the energy E_(Mn)(0) of the obtained n-thchannel upmixed monaural decoded sound signal in the n-th channelpurification weight estimation unit 1112-n in order to use this energyE_(Mn)(0) as the “energy E_(Mn)(−1) of the n-th channel upmixed monauraldecoded sound signal that has been used in the previous frame” in thenext frame.

Next, the n-th channel purification weight estimation unit 1112-nobtains the normalized inner product value r_(n) by the followingExpression (2-11) using the inner product value E_(n)(0) to be used inthe current frame obtained in step S1112-51-n and the energy E_(Mn)(0)of the n-th channel upmixed monaural decoded sound signal to be used inthe current frame obtained in step S1112-52-n (step S1112-53-n).

[Math. 25]

r _(n) =E _(n)(0)/E _(Mn)(0)  (2-11)

The n-th channel purification weight estimation unit 1112-n also obtainsthe correction coefficient c_(M) by Expression (2-8) (step S1112-54-n).Next, the n-th channel purification weight estimation unit 1112-nobtains the value c_(n)×r_(n) obtained by multiplying the normalizedinner product value r_(n) obtained in step S1112-53-n by the correctioncoefficient c_(n) obtained in step S1112-54-n as the n-th channelpurification weight α_(n) (step S1112-55-n).

That is, the n-th channel purification weight estimation unit 1112-n ofthe fifth example obtains the value c_(n)×r_(n) obtained by multiplyingthe normalized inner product value r_(n) obtained by Expression (2-11)using the inner product value E_(n)(0) obtained by Expression (2-9)using each sample value ^({circumflex over ( )})x_(n)(t) of the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n), each samplevalue ^({circumflex over ( )})x_(Mn)(t) of the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn), and theinner product value E_(n)(−1) of the previous frame, and the energyE_(Mn)(0) of the n-th channel upmixed monaural decoded sound signalobtained by Expression (2-10) using each sample value^({circumflex over ( )})x_(Mn)(t) of the n-th channel upmixed monauraldecoded sound signal ^({circumflex over ( )})X_(Mn) and the energyE_(Mn)(−1) of the n-th channel upmixed monaural decoded sound signal ofthe previous frame, by the correction coefficient c_(n) obtained byExpression (2-8) using the number of samples T per frame, the number ofbits b_(n) corresponding to the n-th channel in the number of bits ofthe stereo code CS, and the number of bits b_(M) of the monaural codeCM, as the n-th channel purification weight α_(n).

Sixth Example

The n-th channel purification weight estimation unit 1112-n of the sixthexample obtains a value λ×c_(n)×r_(n) obtained by multiplying thenormalized inner product value r_(n) and the correction coefficientc_(n) described in the third example or the normalized inner productvalue r_(n) and the correction coefficient c_(n) described in the fifthexample by λ that is a predetermined value larger than 0 and smallerthan 1 as the n-th channel purification weight α_(n).

Seventh Example

The n-th channel purification weight estimation unit 1112-n of theseventh example obtains the value γ×c_(n)×r_(n) obtained by multiplyingthe normalized inner product value r_(n) and the correction coefficientc_(n) described in the third example or the normalized inner productvalue r_(n) and the correction coefficient c_(n) described in the fifthexample by the inter-channel correlation coefficient γ which is thecorrelation coefficient between the first channel decoded sound signaland the second channel decoded sound signal, as the n-th channelpurification weight α_(n).

[n-Th Channel Signal Purification Unit 1122-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2) . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1102, the n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} output by the monaural decoded soundupmixing unit 1172, and the n-th channel purification weight α_(n)output by the n-th channel purification weight estimation unit 1112-nare input to the n-th channel signal purification unit 1122-n. For eachcorresponding sample t, the n-th channel signal purification unit 1122-nobtains and outputs a sequence based on a value ^(˜)x_(n)(t) obtained byadding a value α_(n)×^({circumflex over ( )})x_(n)(t) obtained bymultiplying the n-th channel purification weight α_(n) by the samplevalue ^({circumflex over ( )})x_(Mn)(t) of the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(M) and a value(1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained by multiplying avalue (1−α_(n)) obtained by subtracting the n-th channel purificationweight α_(n) from 1 by the sample value ^({circumflex over ( )})x_(n)(t)of the n-th channel decoded sound signal ^({circumflex over ( )})X_(n),as the n-th channel purified decoded sound signal^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} (stepS1122-n). That is,^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×{circumflexover ( )}x_(Mn)(t).

Third Embodiment

Similarly to the sound signal purification device of the firstembodiment and the second embodiment, a sound signal purification deviceof a third embodiment also improves the decoded sound signal of the eachchannel of the stereo by using a monaural decoded sound signal obtainedfrom a code different from the code from which the decoded sound signalis obtained. The sound signal purification device of the thirdembodiment is different from the sound signal purification device of thesecond embodiment in that the inter-channel relationship information isobtained not from a decoded sound signal but from a code. Hereinafter,regarding the sound signal purification device of the third embodiment,differences from the sound signal purification device of the secondembodiment will be described using an example in a case where the numberof channels of the stereo is two.

<<Sound Signal Purification Device 1103>>

As illustrated in FIG. 7 , the sound signal purification device 1103 ofthe third embodiment includes an inter-channel relationship informationdecoding unit 1143, the monaural decoded sound upmixing unit 1172, thefirst channel purification weight estimation unit 1112-1, the firstchannel signal purification unit 1122-1, the second channel purificationweight estimation unit 1112-2, and the second channel signalpurification unit 1122-2. For the each frame, as illustrated in FIG. 8 ,the sound signal purification device 1103 performs steps S1143 andS1172, and steps S1112-n and S1122-n for the each channel.

The sound signal purification device 1103 of the third embodiment isdifferent from the sound signal purification device 1102 of the secondembodiment in that the inter-channel relationship information decodingunit 1143 is provided instead of the inter-channel relationshipinformation estimation unit 1132, and step S1143 is performed instead ofstep S1132. Further, the inter-channel relationship information code CCof the each frame is also input to the sound signal purification device1103 of the third embodiment. The inter-channel relationship informationcode CC may be a code obtained and output by the inter-channelrelationship information encoding unit, which is not illustrated,included in the above-described encoding device 500, or may be a codeincluded in the stereo code CS obtained and output by the stereoencoding unit 530 of the above-described encoding device 500.

Hereinafter, differences between the sound signal purification device1103 of the third embodiment and the sound signal purification device1102 of the second embodiment will be described.

[Inter-Channel Relationship Information Decoding Unit 1143]

The inter-channel relationship information code CC input to the soundsignal purification device 1103 is input to the inter-channelrelationship information decoding unit 1143. The inter-channelrelationship information decoding unit 1143 decodes the inter-channelrelationship information code CC to obtain and output the inter-channelrelationship information (step S1143). The inter-channel relationshipinformation obtained by the inter-channel relationship informationdecoding unit 1143 is the same as the inter-channel relationshipinformation obtained by the inter-channel relationship informationestimation unit 1132 of the second embodiment.

Modification Example of Third Embodiment

In a case where the inter-channel relationship information code CC is acode included in the stereo code CS, the same inter-channel relationshipinformation obtained in step S1143 is obtained by decoding in the stereodecoding unit 620 of the decoding device 600.

Therefore, in a case where the inter-channel relationship informationcode CC is a code included in the stereo code CS, the inter-channelrelationship information obtained by the stereo decoding unit 620 of thedecoding device 600 may be input to the sound signal purification device1103 of the third embodiment, and the sound signal purification device1103 of the third embodiment may not include the inter-channelrelationship information decoding unit 1143 and may not perform stepS1143.

Further, in a case where only a part of the inter-channel relationshipinformation code CC is a code included in the stereo code CS, it is onlyrequired that the inter-channel relationship information obtained bydecoding the code included in the stereo code CS in the inter-channelrelationship information code CC by the stereo decoding unit 620 of thedecoding device 600 is input to the sound signal purification device1103 of the third embodiment, and that the inter-channel relationshipinformation decoding unit 1143 of the sound signal purification device1103 of the third embodiment decodes, as step S1143, a code not includedin the stereo code CS in the inter-channel relationship information codeCC to obtain and output the inter-channel relationship information thathas not been input to the sound signal purification device 1103.

Further, in a case where a code corresponding to a part of theinter-channel relationship information used by each unit of the soundsignal purification device 1103 is not included in the inter-channelrelationship information code CC, the sound signal purification device1103 of the third embodiment is only required to also include theinter-channel relationship information estimation unit 1132, so that theinter-channel relationship information estimation unit 1132 alsoperforms step S1132. In this case, in step S1132, the inter-channelrelationship information estimation unit 1132 is only required to obtainand output the inter-channel relationship information that cannot beobtained by decoding the inter-channel relationship information code CCamong pieces of the inter-channel relationship information used byrespective units of the sound signal purification device 1103, similarlyto step S1132 of the second embodiment.

Fourth Embodiment

Similarly to the sound signal purification device of the first to thirdembodiments, a sound signal purification device of a fourth embodimentalso improves the decoded sound signal of the each channel of the stereoby using a monaural decoded sound signal obtained from a code differentfrom the code from which the decoded sound signal is obtained.Hereinafter, the sound signal purification device of the fourthembodiment will be described with reference to the sound signalpurification devices of the above-described embodiments as appropriateusing an example in a case where the number of channels of the stereo istwo.

As illustrated in FIG. 9 , the sound signal purification device 1201 ofthe fourth embodiment includes a decoded sound common signal estimationunit 1251, a common signal purification weight estimation unit 1211, acommon signal purification unit 1221, a first channel separationcombination weight estimation unit 1281-1, a first channel separationcombination unit 1291-1, a second channel separation combination weightestimation unit 1281-2, and a second channel separation combination unit1291-2.

The sound signal purification device 1201 obtains a purified commonsignal, which is a sound signal obtained by improving a decoded soundcommon signal, from the decoded sound common signal and the monauraldecoded sound signal for the decoded sound common signal that is asignal common to all channels of the decoded sound of the stereo, forexample, in units of frames having a predetermined time length of 20 ms,to obtain and output, for the each channel of the stereo, a purifieddecoded sound signal which is a sound signal obtained by improving thedecoded sound signal of the channel from the decoded sound commonsignal, the purified common signal, and the decoded sound signal of thechannel. The decoded sound signals of the respective channels input inunits of frames to the sound signal purification device 1201 are, forexample, the first channel decoded sound signal^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)} ofthe T samples and the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)} ofthe T samples obtained by the stereo decoding unit 620 of the decodingdevice 600 described above decoding the b_(S)-bit stereo code CS that isa code different from the monaural code CM without using the informationobtained by decoding the monaural code CM or the monaural code CM. Themonaural decoded sound signal input in units of frames to the soundsignal purification device 1201 is, for example, the monaural decodedsound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} of the T samples obtained by themonaural decoding unit 610 of the decoding device 600 described abovedecoding the b_(M)-bit monaural code CM that is a code different fromthe stereo code CS without using the information obtained by decodingthe stereo code CS or the stereo code CS. The monaural code CM is a codederived from the same sound signal as the sound signal from which thestereo code CS is derived (that is, the first channel input sound signalX₁ and the second channel input sound signal X₂ input to the encodingdevice 500), but is a code different from the code from which the firstchannel decoded sound signal ^({circumflex over ( )})X₁ and the secondchannel decoded sound signal ^({circumflex over ( )})X₂ are obtained(that is, the stereo code CS). Assuming that the channel number n of thefirst channel is 1 and the channel number n of the second channel is 2,the sound signal purification device 1201 performs steps S1251, S1211,and S1221 and steps S1281-n and S1291-n for the each channel asillustrated in FIG. 10 for the each frame.

[Decoded Sound Common Signal Estimation Unit 1251]

At least the first channel decoded sound signal^({circumflex over ( )})X₁={x₁(1), ^({circumflex over ( )})x₁(2), . . ., ^({circumflex over ( )})x₁(T)} and the second channel decoded soundsignal ^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)}input to the sound signal purification device 1201 are input to thedecoded sound common signal estimation unit 1251. The decoded soundcommon signal estimation unit 1251 obtains and outputs a decoded soundcommon signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} by using at least the first channeldecoded sound signal ^({circumflex over ( )})X₁ and the second channeldecoded sound signal ^({circumflex over ( )})X₂ (step S1251). Thedecoded sound common signal estimation unit 1251 is only required touse, for example, any of the following methods.

[[First Method for Obtaining Decoded Sound Common Signal]]

In a first method, the decoded sound common signal estimation unit 1251also uses the monaural decoded sound signal^({circumflex over ( )})X_(M) input to the sound signal purificationdevice 1201 to obtain and output the decoded sound common signal^({circumflex over ( )})Y_(M). That is, in the case of using the firstmethod, the first channel decoded sound signal^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)},the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)},and the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} input to the sound signal purificationdevice 1201 are input to the decoded sound common signal estimation unit1251. First, the decoded sound common signal estimation unit 1251obtains a weighting coefficient that minimizes the difference betweenthe weighted average of the decoded sound signals of all channels of thestereo (weighted average of decoded sound signals^({circumflex over ( )})X₁, . . . , ^({circumflex over ( )})X_(N) of allchannels from the first to the N-th channel) and the monaural decodedsound signal (step S1251A-1). For example, the decoded sound commonsignal estimation unit 1251 obtains w_(cand) having a minimum valueobtained by the following Expression (41) among w_(cand) of −1 or moreand 1 or less as the weighting coefficient w.

[Math. 26] $\begin{matrix}{{\sum\limits_{t = 1}^{T}| {( {{\frac{1 + w_{cand}}{2}{{\overset{\hat{}}{x}}_{1}(t)}} + {\frac{1 - w_{cand}}{2}{{\overset{\hat{}}{x}}_{2}(t)}}} ) - {{\hat{x}}_{M}(t)}} }❘}^{2} & (41)\end{matrix}$

Next, the decoded sound common signal estimation unit 1251 obtains aweighted average of the decoded sound signals of all channels of thestereo using the weighting coefficients (weighted average of the decodedsound signals ^({circumflex over ( )})X₁, . . . ,^({circumflex over ( )})X_(N) of all the channels from the first to theN-th channel) obtained in step S1251A-1, as the decoded sound commonsignal (step S1251A-2). For example, the decoded sound common signalestimation unit 1251 obtains the decoded sound common signal^({circumflex over ( )})y_(M)(t) for each sample number t by thefollowing Expression (42).

[Math. 27] $\begin{matrix}{{{\hat{y}}_{M}(t)} = {{\frac{1 + w}{2}{{\overset{\hat{}}{x}}_{1}(t)}} + {\frac{1 - w}{2}{{\overset{\hat{}}{x}}_{2}(t)}}}} & (42)\end{matrix}$

[[Second Method for Obtaining Decoded Sound Common Signal]]

A second method is a method corresponding to a case where the downmixingunit 510 of the encoding device 500 obtains the downmixed signal by the[[Second Method for Obtaining Downmixed Signal]]. In the second method,the decoded sound common signal estimation unit 1251 obtains the decodedsound common signal ^({circumflex over ( )})Y_(M) by performing stepS1251B described later. In a case of using the second method, the soundsignal purification device 1201 also includes an inter-channelrelationship information estimation unit 1231 as indicated by a brokenline in FIG. 9 in order to obtain the inter-channel correlationcoefficient γ and preceding channel information used in step S1251B tobe described later, and the inter-channel relationship informationestimation unit 1231 performs the following step S1231 before thedecoded sound common signal estimation unit 1251 performs step S1251B.

[[[Inter-Channel Relationship Information Estimation Unit 1231]]]

At least the first channel decoded sound signal^({circumflex over ( )})X₁ input to the sound signal purification device1201 and the second channel decoded sound signal^({circumflex over ( )})X₂ input to the sound signal purification device1201 are input to the inter-channel relationship information estimationunit 1231. The inter-channel relationship information estimation unit1231 obtains and outputs the inter-channel correlation coefficient γ andthe preceding channel information as the inter-channel relationshipinformation by using at least the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ (step S1231). The inter-channel correlationcoefficient γ is a correlation coefficient of the first channel decodedsound signal and the second channel decoded sound signal. The precedingchannel information is information indicating which of the first channeland the second channel is preceding. For example, the inter-channelrelationship information estimation unit 1231 performs the followingsteps S1231-1 to S1231-3.

The inter-channel relationship information estimation unit 1231 firstobtains the inter-channel time difference τ by the method exemplified inthe description of the inter-channel relationship information estimationunit 1132 of the second embodiment (step S1231-1). Next, theinter-channel relationship information estimation unit 1231 obtains andoutputs a maximum value among correlation values between the firstchannel decoded sound signal and the sample sequence of the secondchannel decoded sound signal at a position shifted backward from thesample sequence by the inter-channel time difference τ, that is,correlation values Y_(can)a calculated for each number of possiblesamples τ_(cand) from τ_(max) to τ_(min), as the inter-channelcorrelation coefficient γ (step S1231-2). In a case where theinter-channel time difference τ is a positive value, the inter-channelrelationship information estimation unit 1231 also obtains and outputsinformation indicating that the first channel is preceding as thepreceding channel information, and in a case where the inter-channeltime difference τ is a negative value, the inter-channel relationshipinformation estimation unit 1231 obtains and outputs informationindicating that the second channel is preceding as the preceding channelinformation (step S1231-3). In a case where the inter-channel timedifference τ is zero, the inter-channel relationship informationestimation unit 1231 may obtain and output the information indicatingthat the first channel is preceding as the preceding channelinformation, or may obtain and output the information indicating thatthe second channel is preceding as the preceding channel information butpreferably obtains and outputs information indicating that none of thechannels is preceding as the preceding channel information.

[[[Decoded Sound Common Signal Estimation Unit 1251]]]

The first channel decoded sound signal ^({circumflex over ( )})X₁ inputto the sound signal purification device 1201, the second channel decodedsound signal ^({circumflex over ( )})X₂ input to the sound signalpurification device 1201, the inter-channel correlation coefficient γoutput by the inter-channel relationship information estimation unit1231, and the preceding channel information output by the inter-channelrelationship information estimation unit 1231 are input to the decodedsound common signal estimation unit 1251. The decoded sound commonsignal estimation unit 1251 performs weighted averaging on the firstchannel decoded sound signal ^({circumflex over ( )})X₁ and the secondchannel decoded sound signal ^({circumflex over ( )})X₂ to obtain thedecoded sound common signal ^({circumflex over ( )})Y_(M) such that thedecoded sound signal of the preceding channel out of the first channeldecoded sound signal ^({circumflex over ( )})X₁ and the second channeldecoded sound signal ^({circumflex over ( )})X₂ is included to be largerin the decoded sound common signal ^({circumflex over ( )})Y_(M) as theinter-channel correlation coefficient γ is larger, and outputs thedecoded sound common signal ^({circumflex over ( )})Y_(M)(S1251B).

For example, the decoded sound common signal estimation unit 1251 isonly required to weight and add the first channel decoded sound signal^({circumflex over ( )})x₁(t) and the second channel decoded soundsignal ^({circumflex over ( )})x₂(t) to each corresponding sample numbert by using the weight determined by the inter-channel correlationcoefficient γ, to obtain the decoded sound common signal^({circumflex over ( )})y_(M)(t). Specifically, in a case where thepreceding channel information is the information indicating that thefirst channel is preceding, that is, in a case where the first channelis preceding, the decoded sound common signal estimation unit 1251 isonly required to obtain^({circumflex over ( )})y_(M)(t)=((1+γ)/2)×^({circumflex over ( )})x₁(t)+((1−γ)/2)×^({circumflex over ( )})x₂(t)as the decoded sound common signal ^({circumflex over ( )})y_(M)(t) foreach sample number t. That is, in a case where the first channel ispreceding, the decoded sound common signal estimation unit 1251 is onlyrequired to obtain a sequence based on^({circumflex over ( )})y_(M)(t)=((1+γ)/2)×^({circumflex over ( )})x₁(t)+((1−γ)/2)×^({circumflex over ( )})x₂(t)as the decoded sound common signal ^({circumflex over ( )})Y_(M). In acase where the preceding channel information is the informationindicating that the second channel is preceding, that is, in a casewhere the second channel is preceding, the decoded sound common signalestimation unit 1251 is only required to obtain^({circumflex over ( )})y_(M)(t)=((1−γ)/2)×^({circumflex over ( )})x₁(t)+((1+γ)/2)×^({circumflex over ( )})x₂(t)as the decoded sound common signal ^({circumflex over ( )})y_(M)(t) foreach sample number t. That is, in a case where the second channel ispreceding, the decoded sound common signal estimation unit 1251 is onlyrequired to obtain a sequence based on^({circumflex over ( )})y_(M)(t)=((1−γ)/2)×^({circumflex over ( )})x₁(t)+((1+γ)/2)×^({circumflex over ( )})x₂(t)as the decoded sound common signal ^({circumflex over ( )})Y_(M).

Note that, in a case where the preceding channel information indicatesthat no channel is preceding, the decoded sound common signal estimationunit 1251 is only required to obtain^({circumflex over ( )})y_(M)(t)=(^({circumflex over ( )})x₁(t)+^({circumflex over ( )})x₂(t))/2obtained by averaging the first channel decoded sound signal^({circumflex over ( )})x₁(t) and the second channel decoded soundsignal ^({circumflex over ( )})x₂(t) for each sample number t as thedecoded sound common signal ^({circumflex over ( )})y_(M)(t). That is,in a case where none of the channels is preceding, the decoded soundcommon signal estimation unit 1251 is only required to obtain a sequencebased on^({circumflex over ( )})y_(M)(t)=(^({circumflex over ( )})x₁(t)+^({circumflex over ( )})x₂(t))/2as the decoded sound common signal ^({circumflex over ( )})Y_(M).

[Common Signal Purification Weight Estimation Unit 1211]

The common signal purification weight estimation unit 1211 obtains andoutputs a common signal purification weight α_(M) (step 1211). Thecommon signal purification weight estimation unit 1211 obtains thecommon signal purification weight α_(M) by a method similar to themethod based on the principle of minimizing the quantization errordescribed in the first embodiment. The common signal purification weightα_(M) obtained by the common signal purification weight estimation unit1211 is a value of 0 or more and 1 or less. However, since the commonsignal purification weight estimation unit 1211 obtains the commonsignal purification weight α_(M) for the each frame by the method to bedescribed later, the common signal purification weight α_(M) does notbecome zero or one in all the frames. That is, there is a frame in whichthe common signal purification weight α_(M) is a value larger than 0 andsmaller than 1. In other words, in at least any one of all the frames,the common signal purification weight α_(M) is a value larger than 0 andsmaller than 1.

Specifically, as in the following first to seventh examples, the commonsignal purification weight estimation unit 1211 obtains a commoncomponent signal weight α_(M) by using the decoded sound common signal^({circumflex over ( )})Y_(M) instead of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) at a position where the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) is used inthe method based on the principle of minimizing the quantization errordescribed in the first embodiment, and by using the number of bits b_(m)corresponding to the common signal in the number of bits of the stereocode CS instead of the number of bits b_(n) at a position where thenumber of bits b_(n) corresponding to the n-th channel in the number ofbits of the stereo code CS is used in the method based on the principleof minimizing the quantization error described in the first embodiment.That is, in the following first to seventh examples, the number of bitsb_(M) of the monaural code CM and the number of bits b_(m) correspondingto the common signal in the number of bits of the stereo code CS areused. Since the method for specifying the number of bits b_(M) of themonaural code CM is the same as that of the first embodiment, a methodfor specifying the number of bits b_(m) corresponding to the commonsignal in the number of bits of the stereo code CS will be describedbefore describing the first to seventh examples. The decoded soundcommon signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} output by the decoded sound commonsignal estimation unit 1251 and the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)}input to the sound signal purificationdevice 1101 are input to the common signal purification weightestimation unit 1211 as necessary as indicated by a one-dot chain linein FIG. 9 .

[Method for Specifying Number of Bits b_(m) in Number of Bits of StereoCode CS] [[First Method for Specifying Number of Bits b_(m) in Number ofBits of Stereo Code CS]]

The common signal purification weight estimation unit 1211 uses a valueobtained by multiplying the number of bits b_(s) of the stereo code CSby a predetermined value larger than 0 and smaller than 1 as b_(m). Thatis, in a case where the number of bits b_(s) of the stereo code CS inthe decoding method used by the stereo decoding unit 620 is the same inall the frames, a value obtained by multiplying the number of bits b_(S)of the stereo code CS by a predetermined value larger than 0 and smallerthan 1 is only required to be stored as the number of bits b_(m) in thestorage unit, which is not illustrated, in the common signalpurification weight estimation unit 1211. In a case where the number ofbits b_(s) of the stereo code CS in the decoding method used by thestereo decoding unit 620 is different depending on the frame, the commonsignal purification weight estimation unit 1211 is only required toobtain a value obtained by multiplying the number of bits b_(s) by apredetermined value larger than 0 and smaller than 1 as b_(m). Forexample, the common signal purification weight estimation unit 1211 isonly required to use the reciprocal of the number of channels as thepredetermined value larger than 0 and smaller than 1. That is, thecommon signal purification weight estimation unit 1211 may use a valueobtained by dividing the number of bits b_(s) of the stereo code CS bythe number of channels as b_(m).

[[Second Method for Specifying Number of Bits b_(m) in Number of Bits ofStereo Code CS]]

The common signal purification weight estimation unit 1211 may estimateb_(m) for the each frame using the inter-channel correlation coefficientγ. In a case where the correlation between the channels is high, most ofthe number of bits b_(S) of the stereo code CS is used to express asignal component common between the channels, and in a case where thecorrelation between the channels is low, it is expected that the numberof bits close to an equal number with respect to the number of channelsis used. Therefore, in the second method, the common signal purificationweight estimation unit 1211 is only required to obtain a value closer tothe number of bits b_(s) as b_(m) as the inter-channel correlationcoefficient γ is closer to 1, and is only required to obtain a valuecloser to a value obtained by dividing b_(s) by the number of channelsas b_(m) as the inter-channel correlation coefficient γ is closer tozero. Note that, in a case where the second method is used, the soundsignal purification device 1201 also includes the inter-channelrelationship information estimation unit 1231 as indicated by a brokenline in FIG. 9 in order to obtain the inter-channel correlationcoefficient γ, and the inter-channel relationship information estimationunit 1231 obtains the inter-channel correlation coefficient γ asdescribed above in the description of [[Second Method for ObtainingDecoded Sound Common Component Signal]] and the description of theinter-channel relationship information estimation unit 1132 of thesecond embodiment.

First Example

The common signal purification weight estimation unit 1211 of the firstexample obtains the common signal purification weight α_(M) by thefollowing Expression (4-5) using the number of samples T per frame, thenumber of bits b_(m) corresponding to the common signal in the number ofbits of the stereo code CS, and the number of bits b_(M) of the monauralcode CM.

[Math. 28] $\begin{matrix}{\alpha_{M} = \frac{2^{- \frac{2b_{m}}{T}}}{2^{- \frac{2b_{m}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & ( {4 - 5} )\end{matrix}$

Second Example

The common signal purification weight estimation unit 1211 of the secondexample uses at least the number of bits b_(m) corresponding to thecommon signal in the number of bits of the stereo code CS and the numberof bits b_(M) of the monaural code CM to obtain a value that is largerthan 0 and smaller than 1, 0.5 when b_(m) and b_(M) are equal, closer to0 than 0.5 as b_(m) is larger than b_(M), and closer to 1 than 0.5 asb_(M) is larger than b_(m) as the common signal purification weightα_(M).

Third Example

The common signal purification weight estimation unit 1211 of the thirdexample obtains a value c_(M)×r_(M) obtained by multiplying thecorrection coefficient c_(M) obtained by

$\begin{matrix}\lbrack {{Math}.29} \rbrack &  \\{c_{M} = \frac{2^{- \frac{2b_{m}}{T}}}{2^{- \frac{2b_{m}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & ( {4 - 8} )\end{matrix}$

using the number of samples T per frame, the number of bits b_(m)corresponding to the common signal in the number of bits of the stereocode CS, and the number of bits b_(M) of the monaural code CM by anormalized inner product value r_(M) for the monaural decoded soundsignal ^({circumflex over ( )})X_(M) of the decoded sound common signal^({circumflex over ( )})Y_(M) as the common signal purification weightα_(M).

The common signal purification weight estimation unit 1211 of the thirdexample obtains the common signal purification weight α_(M) byperforming, for example, the following steps S1211-31-n to S1211-33-n.The common signal purification weight estimation unit 1211 first obtainsthe normalized inner product value r_(M) for the monaural decoded soundsignal ^({circumflex over ( )})X_(M) of the decoded sound common signal^({circumflex over ( )})Y_(M) by the following Expression (4-6) from thedecoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} and the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} (step S1211-31-n).

$\begin{matrix}\lbrack {{Math}.30} \rbrack &  \\{r_{M} = \frac{{\sum}_{t = 1}^{T}{{\hat{y}}_{M}(t)}{{\hat{x}}_{M}(t)}}{{\sum}_{t = 1}^{T}{{\hat{x}}_{M}(t)}{{\hat{x}}_{M}(t)}}} & ( {4 - 6} )\end{matrix}$

The common signal purification weight estimation unit 1211 also obtainsthe correction coefficient c_(M) by Expression (4-8) using the number ofsamples T per frame, the number of bits b_(m) corresponding to thecommon signal in the number of bits of the stereo code CS, and thenumber of bits b_(M) of the monaural code CM (step S1211-32-n). Next,the common signal purification weight estimation unit 1211 obtains thevalue c_(M)×r_(M) obtained by multiplying the normalized inner productvalue r_(M) obtained in step S1211-31-n by the correction coefficientc_(M) obtained in step S1211-32-n as the common signal purificationweight α_(M) (step S1211-33-n).

Fourth Example

The common signal purification weight estimation unit 1211 of the fourthexample uses the number of bits corresponding to the common signal inthe number of bits of the stereo code CS as b_(m) and the number of bitsof the monaural code CM as by to obtain the value c_(M)×r_(M) obtainedby multiplying r_(M) that is a value of 0 or more and 1 or less, closerto 1 as the correlation between the decoded sound common signal^({circumflex over ( )})Y_(M) and the monaural decoded sound signal^({circumflex over ( )})X_(M) is higher, and closer to 0 as thecorrelation is lower by the correction coefficient c_(M) that is a valuelarger than 0 and smaller than 1, 0.5 when b_(m) and b_(M) are equal,closer to 0 than 0.5 as the b_(m) is larger than b_(M), and closer to 1than 0.5 as the b_(m) is smaller than b_(M), as the common signalpurification weight α_(M).

Fifth Example

The common signal purification weight estimation unit 1211 of the fifthexample obtains the common signal purification weight α_(M) byperforming the following steps S1211-51 to S1211-55.

The common signal purification weight estimation unit 1211 first obtainsthe inner product value E_(m)(0) to be used in the current frame by thefollowing Expression (4-9) using the decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)}, the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)}, and the inner product value E_(m)(−1)that has been used in the previous frame (step S1211-51).

$\begin{matrix}\lbrack {{Math}.31} \rbrack &  \\{{E_{m}(0)} = {{\epsilon_{m}{E_{m}( {- 1} )}} + {\frac{( {1 - \epsilon_{m}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{y}}_{M}(t)}{{\hat{x}}_{M}(t)}}}}}} & ( {4 - 9} )\end{matrix}$

Here, ε_(m) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the common signal purification weightestimation unit 1211. Note that the common signal purification weightestimation unit 1211 stores the obtained inner product value E_(m)(0) inthe common signal purification weight estimation unit 1211 in order touse this inner product value E_(m)(0) as the inner product valueE_(m)(−1) that has been used in the previous frame in the next frame.

The common signal purification weight estimation unit 1211 also obtainsthe energy E_(M)(0) of the monaural decoded sound signal to be used inthe current frame by the following Expression (4-10) using the monauraldecoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} and the energy E_(M)(−1) of themonaural decoded sound signal that has been used in the previous frame(step S1211-52).

$\begin{matrix}\lbrack {{Math}.32} \rbrack &  \\{{E_{M}(0)} = {{\epsilon_{M}{E_{M}( {- 1} )}} + {\frac{( {1 - \epsilon_{M}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{x}}_{M}(t)}{{\hat{x}}_{M}(t)}}}}}} & ( {4 - 10} )\end{matrix}$

Here, ε_(m) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the common signal purification weightestimation unit 1211. Note that the common signal purification weightestimation unit 1211 stores the obtained energy E_(M)(0) of the monauraldecoded sound signal in the common signal purification weight estimationunit 1211 in order to use this energy E_(M)(0) as “the energy E_(M)(−1)of the monaural decoded sound signal that has been used in the previousframe” in the next frame.

Next, the common signal purification weight estimation unit 1211 obtainsthe normalized inner product value r_(M) by the following Expression(4-11) using the inner product value E_(m)(0) to be used in the currentframe obtained in step S1211-51 and the energy E_(M)(0) of the monauraldecoded sound signal used in the current frame obtained in step S1211-52(step S1211-53).

[Math. 33]

r _(M) =E _(m)(0)/E _(M)(0)  (4-11)

The common signal purification weight estimation unit 1211 also obtainsthe correction coefficient c_(M) by Expression (4-8) (step S1211-54).Next, the common signal purification weight estimation unit 1211 obtainsthe value c_(M)×r_(M) obtained by multiplying the normalized innerproduct value r_(M) obtained in step S1211-53 by the correctioncoefficient c_(M) obtained in step S1211-54, as the common signalpurification weight cay (step S1211-55).

That is, the common signal purification weight estimation unit 1211 ofthe fifth example obtains the value c_(M)×r_(M) obtained by multiplyingthe normalized inner product value r_(M) obtained by Expression (4-11)using the inner product value E_(m)(0) obtained by Expression (4-9)using each sample value ^({circumflex over ( )})y_(M)(t) of the decodedsound common signal ^({circumflex over ( )})Y_(M), each sample value^({circumflex over ( )})x_(M)(t) of the monaural decoded sound signal^({circumflex over ( )})X_(M), and the inner product value E_(m)(−1) ofthe previous frame, and the energy E_(M)(0) of the monaural decodedsound signal obtained by Expression (4-10) using each sample value^({circumflex over ( )})x_(M)(t) of the monaural decoded sound signal^({circumflex over ( )})X_(M) and the energy E_(M)(−1) of the monauraldecoded sound signal of the previous frame by the correction coefficientc_(M) obtained by Expression (4-8) using the number of samples T perframe, the number of bits b_(m) corresponding to the common signal inthe number of bits of the stereo code CS, and the number of bits b_(M)of the monaural code CM, as the common signal purification weight α_(M).

Sixth Example

The common signal purification weight estimation unit 1211 of the sixthexample obtains the value λ×c_(M)×r_(M) obtained by multiplying thenormalized inner product value r_(M) and the correction coefficientc_(M) described in the third example or the normalized inner productvalue r_(M) and the correction coefficient c_(M) described in the fifthexample by λ that is a predetermined value larger than 0 and smallerthan 1 as the common signal purification weight α_(M).

Seventh Example

The common signal purification weight estimation unit 1211 of theseventh example obtains the value γ×c_(M)×r_(M) obtained by multiplyingthe normalized inner product value r_(M) and the correction coefficientc_(M) described in the third example or the normalized inner productvalue r_(M) and the correction coefficient c_(M) described in the fifthexample by the inter-channel correlation coefficient γ that is thecorrelation coefficient between the first channel decoded sound signaland the second channel decoded sound signal, as the common signalpurification weight α_(M). The sound signal purification device 1201 ofthe seventh example also includes the inter-channel relationshipinformation estimation unit 1231 as indicated by a broken line in FIG. 9in order to obtain the inter-channel correlation coefficient γ, and theinter-channel relationship information estimation unit 1231 obtains theinter-channel correlation coefficient γ as described above in thedescription of the [[Second Method for Obtaining Decoded Sound CommonComponent Signal]] and the description of the inter-channel relationshipinformation estimation unit 1132 of the second embodiment.

[Common Signal Purification Unit 1221]

The decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} output by the decoded sound commonsignal estimation unit 1251, the monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} input to the sound signal purificationdevice 1201, and the common signal purification weight α_(M) output bythe common signal purification weight estimation unit 1211 are input tothe common signal purification unit 1221. For each corresponding samplet, the common signal purification unit 1221 obtains and outputs asequence based on a value ^(˜)y_(M)(t) obtained by adding a valueα_(M)×^({circumflex over ( )})x_(M)(t) obtained by multiplying thecommon signal purification weight α_(M) by the sample value^({circumflex over ( )})x_(M)(t) of the monaural decoded sound signal^({circumflex over ( )})X_(M) and a value(1−α_(M))×^({circumflex over ( )})y_(M)(t) obtained by multiplying avalue (1−α_(M)) obtained by subtracting the common signal purificationweight α_(M) from 1 by the sample value ^({circumflex over ( )})y_(M)(t)of the decoded sound common signal ^({circumflex over ( )})Y_(M), as apurified common signal ^(˜)Y_(M)={^(˜)y_(M)(1), ^(˜)y_(M)(2), . . . ,^(˜)y_(M)(T)} (step S1221). That is,^(˜)y_(M)(t)=(1−α_(M))×^({circumflex over ( )})y_(M)(t)+α_(M)×^({circumflex over ( )})x_(M)(t).

[n-Th Channel Separation Combination Weight Estimation Unit 1281-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1201 and the decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} output by the decoded sound commonsignal estimation unit 1251 are input to the n-th channel separationcombination weight estimation unit 1281-n. The n-th channel separationcombination weight estimation unit 1281-n obtains a normalized innerproduct value for the decoded sound common signal^({circumflex over ( )})Y_(M) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n) from the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the decoded sound common signal^({circumflex over ( )})Y_(M) as an n-th channel separation combinationweight β_(n) (step S1281-n). Specifically, the n-th channel separationcombination weight β_(n) is as represented by Expression (43).

$\begin{matrix}\lbrack {{Math}.34} \rbrack &  \\{\beta_{M} = \frac{{\sum}_{t = 1}^{T}{{\hat{x}}_{n}(t)}{{\hat{y}}_{M}(t)}}{{\sum}_{t = 1}^{T}{{\hat{y}}_{M}(t)}{{\hat{y}}_{M}(t)}}} & (43)\end{matrix}$

[n-Th Channel Separation Combination Unit 1291-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2) . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1201, the decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} output by the decoded sound commonsignal estimation unit 1251, the purified common signal^(˜)Y_(M)={^(˜)y_(M)(1), ^(˜)y_(M)(2), . . . , ^(˜)y_(M)(T)} output bythe common signal purification unit 1221, and the n-th channelseparation combination weight β_(n) output by the n-th channelseparation combination weight estimation unit 1281-n are input to then-th channel separation combination unit 1291-n. For each correspondingsample t, the n-th channel separation combination unit 1291-n obtainsand outputs a sequence based on a value ^(˜)x_(n)(t) obtained bysubtracting a value β_(n)×^({circumflex over ( )})y_(M)(t) obtained bymultiplying the n-th channel separation combination weight β_(n) by thesample value ^({circumflex over ( )})y_(M)(t) of the decoded soundcommon signal ^({circumflex over ( )})Y_(M) from the sample value^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(M)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(M)(t) of the purifiedcommon signal ^(˜)Y_(M), as the n-th channel purified decoded soundsignal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)}(step S1291-n). That is,^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(M)(t)+β_(n)×^(˜)y_(M)(t).

Modification Example of Fourth Embodiment

In a case where the sound signal purification device 1201 uses theinter-channel relationship information and the stereo decoding unit 620of the decoding device 600 obtains at least one piece of theinter-channel relationship information used by the sound signalpurification device 1201, the inter-channel relationship informationobtained by the stereo decoding unit 620 of the decoding device 600 maybe input to the sound signal purification device 1201, and the soundsignal purification device 1201 may use the input inter-channelrelationship information.

In addition, in a case where the sound signal purification device 1201uses the inter-channel relationship information and at least one pieceof the inter-channel relationship information used by the sound signalpurification device 1201 is included in the inter-channel relationshipinformation code CC obtained and output by the inter-channelrelationship information encoding unit, which is not illustrated,included in the encoding device 500 described above, a code representingthe inter-channel relationship information used by the sound signalpurification device 1201 included in the inter-channel relationshipinformation code CC may be input to the sound signal purification device1201, the sound signal purification device 1201 may include aninter-channel relationship information decoding unit, which is notillustrated, and the inter-channel relationship information decodingunit may decode the code representing the inter-channel relationshipinformation to obtain and output the inter-channel relationshipinformation.

That is, in a case where all pieces of the inter-channel relationshipinformation used by the sound signal purification device 1201 are inputto the sound signal purification device 1201 or obtained by theinter-channel relationship information decoding unit, the sound signalpurification device 1201 does not need to include the inter-channelrelationship information estimation unit 1231.

Fifth Embodiment

Similarly to the sound signal purification device of the fourthembodiment, a sound signal purification device of a fifth embodimentalso improves the decoded sound signal of the each channel of the stereoby using a monaural decoded sound signal obtained from a code differentfrom the code from which the decoded sound signal is obtained. The soundsignal purification device of the fifth embodiment is different from thesound signal purification device of the fourth embodiment in that asignal obtained by upmixing the monaural decoded sound signal for theeach channel is used instead of the monaural decoded sound signalitself, and a signal obtained by upmixing the decoded sound commonsignal for the each channel is used instead of the decoded sound commonsignal itself. Hereinafter, regarding the sound signal purificationdevice of the fifth embodiment, differences from the sound signalpurification device of the fourth embodiment will be mainly describedwith reference to the sound signal purification devices of theabove-described embodiments as appropriate, using an example in a casewhere the number of channels of the stereo is two.

<<Sound Signal Purification Device 1202>>

As illustrated in FIG. 11 , a sound signal purification device 1202 ofthe fifth embodiment includes an inter-channel relationship informationestimation unit 1232, the decoded sound common signal estimation unit1251, the common signal purification weight estimation unit 1211, thecommon signal purification unit 1221, a decoded sound common signalupmixing unit 1262, a purified common signal upmixing unit 1272, a firstchannel separation combination weight estimation unit 1282-1, a firstchannel separation combination unit 1292-1, a second channel separationcombination weight estimation unit 1282-2, and a second channelseparation combination unit 1292-2. For the each frame, as illustratedin FIG. 12 , the sound signal purification device 1202 performs stepsS1232, S1251, S1211, S1221, S1262, and S1272, and steps S1282-n andS1292-n for the each channel.

[Inter-Channel Relationship Information Estimation Unit 1232]

At least the first channel decoded sound signal^({circumflex over ( )})X₁ input to the sound signal purification device1202 and the second channel decoded sound signal^({circumflex over ( )})X₂ input to the sound signal purification device1202 are input to the inter-channel relationship information estimationunit 1232. The inter-channel relationship information estimation unit1232 obtains and outputs the inter-channel relationship information byusing at least the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ (step S1232). The inter-channel relationshipinformation is information indicating a relationship between thechannels of the stereo. Examples of the inter-channel relationshipinformation are the inter-channel time difference τ, the inter-channelcorrelation coefficient γ, and the preceding channel information. Theinter-channel relationship information estimation unit 1232 may obtain aplurality of types of the inter-channel relationship information and,for example, may obtain the inter-channel time difference τ, theinter-channel correlation coefficient γ, and the preceding channelinformation. As a method of the inter-channel relationship informationestimation unit 1232 to obtain the inter-channel time difference τ and amethod thereof to obtain the inter-channel correlation coefficient γ,for example, it is only required that the methods described above in thedescription of the inter-channel relationship information estimationunit 1132 of the second embodiment are used. In a case where the decodedsound common signal estimation unit 1251 uses the preceding channelinformation, the inter-channel relationship information estimation unit1232 obtains the preceding channel information. As a method of theinter-channel relationship information estimation unit 1232 to obtainthe preceding channel information, for example, it is only required thatthe method described above in the description of the inter-channelrelationship information estimation unit 1231 of the fourth embodimentis used. Note that the inter-channel time difference τ obtained by themethod described above in the description of the inter-channelrelationship information estimation unit 1132 includes the informationindicating the number of samples |τ| corresponding to the timedifference between the first channel and the second channel and theinformation indicating which channel of the first channel and the secondchannel is preceding, and thus, in a case where the inter-channelrelationship information estimation unit 1232 also obtains and outputsthe preceding channel information, information indicating the number ofsamples |τ| corresponding to the time difference between the firstchannel and the second channel may be obtained and output instead of theinter-channel time difference τ.

[Decoded Sound Common Signal Estimation Unit 1251]

The decoded sound common signal estimation unit 1251 obtains and outputsthe decoded sound common component signal ^({circumflex over ( )})Y_(M)similarly to the decoded sound common signal estimation unit 1251 of thefourth embodiment (step S1251).

[Common Signal Purification Weight Estimation Unit 1211]

The common signal purification weight estimation unit 1211 obtains andoutputs the common signal purification weight α_(M) similarly to thecommon signal purification weight estimation unit 1211 of the fourthembodiment (step 1211).

[Common Signal Purification Unit 1221]

The common signal purification unit 1221 obtains and outputs thepurified common signal ^(˜)Y_(M) similarly to the common signalpurification unit 1221 of the fourth embodiment (step S1221).

[Decoded Sound Common Signal Upmixing Unit 1262]

At least the decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} output by the decoded sound commonsignal estimation unit 1251 and the inter-channel relationshipinformation output by the inter-channel relationship informationestimation unit 1232 are input to the decoded sound common signalupmixing unit 1262. The decoded sound common signal upmixing unit 1262performs the upmixing process using at least the decoded sound commonsignal ^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} and the inter-channel relationshipinformation, to thereby obtain and output an n-th channel upmixed commonsignal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} that is a signal obtained by upmixingthe decoded sound common signal for the each channel (step S1262). Thedecoded sound common signal upmixing unit 1262 is only required toobtain the n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn) by, for example, the following firstmethod or second method.

[[First Method for Obtaining n-th Channel Upmixed Common Signal]

The decoded sound common signal upmixing unit 1262 obtains the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn) byperforming the same processing as that of the monaural decoded soundupmixing unit 1172 of the second embodiment by replacing the monauraldecoded sound signal ^({circumflex over ( )})X_(M) with the decodedsound common signal ^({circumflex over ( )})Y_(M) and replacing the n-thchannel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) with the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn). That is, in a case where thefirst channel is preceding, the decoded sound common signal upmixingunit 1262 outputs the decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)}without change as the first channelupmixed common signal ^({circumflex over ( )})Y_(M1)={y_(M1)(1),^({circumflex over ( )})y_(M1)(2), . . . ,^({circumflex over ( )})y_(M1)(T)}, and outputs a signal{^({circumflex over ( )})y_(M)(1−|τ|),^({circumflex over ( )})y_(M)(2−|τ|) . . . ,^({circumflex over ( )})Y_(M)(T−|τ|)} obtained by delaying the decodedsound common signal by |τ| samples as the second channel upmixed commonsignal^({circumflex over ( )})Y_(M2)={^({circumflex over ( )})y_(M2)(1),^({circumflex over ( )})y_(M2)(2), . . . ,^({circumflex over ( )})y_(M2)(T)}. In a case where the second channelis preceding, the decoded sound common signal upmixing unit 1262 outputsa signal {^({circumflex over ( )})y_(M)(1−|τ|),^({circumflex over ( )})y_(M)(2−|τ|), . . . ,^({circumflex over ( )})y_(M)(T−|τ|)} obtained by delaying the decodedsound common signal by |τ| samples as the first channel upmixed commonsignal^({circumflex over ( )})Y_(M1)={^({circumflex over ( )})y_(M1)(1),^({circumflex over ( )})y_(M1)(2), . . . ,^({circumflex over ( )})y_(M1)(T)}, and outputs the decoded sound commonsignal ^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} without change as the second channelupmixed common signal^({circumflex over ( )})Y_(M2)={^({circumflex over ( )})y_(M2)(1),^({circumflex over ( )})y_(M2)(2), . . . ,^({circumflex over ( )})y_(M2)(T)}. In a case where no channel ispreceding, the decoded sound common signal upmixing unit 1262 outputsthe decoded sound common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} without change as the first channelupmixed common signal^({circumflex over ( )})Y_(M1)={^({circumflex over ( )})y_(M1)(1),^({circumflex over ( )})y_(M1)(2), ^({circumflex over ( )})y_(M1)(T)}and the second channel upmixed common signal^({circumflex over ( )})Y_(M2)={^({circumflex over ( )})y_(M2)(1),^({circumflex over ( )})y_(M2)(2), . . . ,^({circumflex over ( )})y_(M2)(T)}.

[[Second Method for Obtaining n-Th Channel Upmixed Common Signal]

In a case where the correlation between the channels is small, the goodn-th channel upmixed common signal ^({circumflex over ( )})Y_(Mn) maynot be obtained only by adding the time difference to the decoded soundcommon signal ^({circumflex over ( )})Y_(M) as in the first method.Accordingly, the second method is that the decoded sound common signalupmixing unit 1262 obtains the n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn) by taking the weighted average of thedecoded sound common signal ^({circumflex over ( )})Y_(M) and thedecoded sound signal ^({circumflex over ( )})X_(n) of the each channelin consideration of the correlation between the channels. In the secondmethod, the decoded sound common signal upmixing unit 1262 uses each ofthe n-th channel upmixed common signals^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} obtained by the first method as atemporary n-th channel upmixed common signal Y′_(Mn)={y′_(Mn)(1),y′_(Mn)(2), . . . , y′_(Mn)(T)} (that is, the same processing as thefirst method is performed by replacing the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(M) with the temporary n-th channelupmixed common signal Y′_(Mn) to obtain the temporary n-th channelupmixed common signal Y′_(Mn)={y′_(Mn)(1), y′_(Mn)(2), . . . ,y′_(Mn)(T)}) to obtain, for each corresponding sample t, a sequencebased on ^({circumflex over ( )})y_(Mn)(n) obtained by the followingExpression (51) using the n-th channel decoded sound^({circumflex over ( )})x_(n)(t), the temporary n-th channel upmixedcommon signal y′_(Mn)(t), and the inter-channel correlation coefficientγ, as the n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)}.

[Math. 35]

ŷ _(Mn)(t)=(1−γ){circumflex over (x)} _(n)(t)+γy′ _(Mn)  (51)

Note that, in a case where the decoded sound common signal upmixing unit1262 performs the second method, the first channel decoded sound signalinput to the sound signal purification device 1202 and the secondchannel decoded sound signal input to the sound signal purificationdevice 1202 are also input to the decoded sound common componentupmixing unit 1262 as indicated by a broken line in FIG. 11 .

[Purified Common Signal Upmixing Unit 1272]

The purified common signal ^(˜)Y_(M)={^(˜)y_(M)(1), ^(˜)y_(M)(2), . . ., ^(˜)y_(M)(T)} output by the common signal purification unit 1221 andthe inter-channel relationship information output by the inter-channelrelationship information estimation unit 1232 are input to the purifiedcommon signal upmixing unit 1272. The purified common signal upmixingunit 1272 performs the upmixing process using the purified common signal^(˜)Y_(M)={^(˜)y_(M)(1), ^(˜)y_(M)(2), . . . , ^(˜)y_(M)(T)} and theinter-channel relationship information, to thereby obtain and output ann-th channel upmixed purified signal ^(˜)Y_(Mn)={^(˜)y_(Mn)(1),^(˜)y_(Mn)(2), . . . , ^(˜)y_(Mn)(T)} that is a signal obtained byupmixing the purified common signal for the each channel (step S1272).The purified common signal upmixing unit 1272 is only required toperform the same processing as that of the monaural decoded soundupmixing unit 1172 of the second embodiment by replacing the monauraldecoded sound signal ^({circumflex over ( )})X with the purified commonsignal ^(˜)Y_(M) and replacing the n-th channel upmixed monaural decodedsound signal ^({circumflex over ( )})X_(M) with the n-th channel upmixedpurified signal ^(˜)Y_(Mn).

[n-Th Channel Separation Combination Weight Estimation Unit 1282-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2) . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1202 and the n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} output by the decoded sound commonsignal upmixing unit 1262 are input to the n-th channel separationcombination weight estimation unit 1282-n. The n-th channel separationcombination weight estimation unit 1282-n obtains and outputs anormalized inner product value for the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) from the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) and the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(M), as the n-th channel separationcombination weight β_(n) (step S1282-n).

Specifically, the n-th channel separation combination weight β_(n) is asrepresented by Expression (52).

$\begin{matrix}\lbrack {{Math}.36} \rbrack &  \\{\beta_{n} = \frac{{\sum}_{t = 1}^{T}{{\hat{x}}_{n}(t)}{{\hat{y}}_{Mn}(t)}}{{\sum}_{t = 1}^{T}{{\hat{y}}_{Mn}(t)}{{\hat{y}}_{Mn}(t)}}} & (52)\end{matrix}$

[n-Th Channel Separation Combination Unit 1292-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1202, the n-th channel upmixed common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} output by the decoded sound commonsignal upmixing unit 1262, the n-th channel upmixed purified signal^(˜)Y_(Mn)={^(˜)y_(Mn)(1), ^(˜)y_(Mn)(2), . . . , ^(˜)y_(Mn)(T)} outputby the purified common signal upmixing unit 1272, and the n-th channelseparation combination weight β_(n) output by the n-th channelseparation combination weight estimation unit 1282-n are input to then-th channel separation combination unit 1292-n. For each correspondingsample t, the n-th channel separation combination unit 1292-n obtainsand outputs a sequence based on a value ^(˜)x_(n)(t) obtained bysubtracting a value β_(n)×^({circumflex over ( )})y_(Mn)(t) obtained bymultiplying the n-th channel separation combination weight β_(n) by asample value ^({circumflex over ( )})y_(Mn)(t) of the n-th channelupmixed common signal ^({circumflex over ( )})Y_(M) from the samplevalue ^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(Mn)(t) of the n-thchannel upmixed purified signal ^(˜)Y_(n), as the n-th channel purifieddecoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . ,^(˜)x_(n)(T)}(step S1292-n). That is,^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(Mn)(t)+β_(n)×^(˜)y_(Mn)(t).

Sixth Embodiment

Similarly to the sound signal purification devices of the fourthembodiment and the fifth embodiment, a sound signal purification deviceof a sixth embodiment also improves the decoded sound signal of the eachchannel of the stereo by using a monaural decoded sound signal obtainedfrom a code different from the code from which the decoded sound signalis obtained. The sound signal purification device of the sixthembodiment is different from the sound signal purification device of thefifth embodiment in that the inter-channel relationship information isobtained not from a decoded sound signal but from a code. Hereinafter,regarding the sound signal purification device of the sixth embodiment,differences from the sound signal purification device of the fifthembodiment will be described using an example in a case where the numberof channels of the stereo is two.

<<Sound Signal Purification Device 1203>>

As illustrated in FIG. 13 , the sound signal purification device 1203 ofthe sixth embodiment includes an inter-channel relationship informationdecoding unit 1243, the decoded sound common signal estimation unit1251, the common signal purification weight estimation unit 1211, thecommon signal purification unit 1221, the decoded sound common signalupmixing unit 1262, the purified common signal upmixing unit 1272, thefirst channel separation combination weight estimation unit 1282-1, thefirst channel separation combination unit 1292-1, the second channelseparation combination weight estimation unit 1282-2, and the secondchannel separation combination unit 1292-2. For the each frame, asillustrated in FIG. 14 , the sound signal purification device 1203performs steps S1243, S1251, S1211, S1221, S1262, and S1272, and stepsS1282-n and S1292-n for the each channel. The sound signal purificationdevice 1203 of the sixth embodiment is different from the sound signalpurification device 1202 of the fifth embodiment in that theinter-channel relationship information decoding unit 1243 is providedinstead of the inter-channel relationship information estimation unit1232, and step S1243 is performed instead of step S1232. Further, theinter-channel relationship information code CC of the each frame is alsoinput to the sound signal purification device 1203 of the sixthembodiment. The inter-channel relationship information code CC may be acode obtained and output by the inter-channel relationship informationencoding unit, which is not illustrated, included in the above-describedencoding device 500, or may be a code included in the stereo code CSobtained and output by the stereo encoding unit 530 of theabove-described encoding device 500. Hereinafter, differences betweenthe sound signal purification device 1203 of the sixth embodiment andthe sound signal purification device 1202 of the fifth embodiment willbe described.

[Inter-Channel Relationship Information Decoding Unit 1243]

The inter-channel relationship information code CC input to the soundsignal purification device 1203 is input to the inter-channelrelationship information decoding unit 1243. The inter-channelrelationship information decoding unit 1243 decodes the inter-channelrelationship information code CC to obtain and output the inter-channelrelationship information (step S1243). The inter-channel relationshipinformation obtained by the inter-channel relationship informationdecoding unit 1243 is the same as the inter-channel relationshipinformation obtained by the inter-channel relationship informationestimation unit 1232 of the fifth embodiment.

Modification Example of Sixth Embodiment

In a case where the inter-channel relationship information code CC is acode included in the stereo code CS, the same inter-channel relationshipinformation obtained in step S1243 is obtained by decoding in the stereodecoding unit 620 of the decoding device 600.

Therefore, in a case where the inter-channel relationship informationcode CC is a code included in the stereo code CS, the inter-channelrelationship information obtained by the stereo decoding unit 620 of thedecoding device 600 may be input to the sound signal purification device1203 of the sixth embodiment, and the sound signal purification device1203 of the sixth embodiment may not include the inter-channelrelationship information decoding unit 1243 and may not perform stepS1243.

Further, in a case where only a part of the inter-channel relationshipinformation code CC is a code included in the stereo code CS, it is onlyrequired that the inter-channel relationship information obtained bydecoding the code included in the stereo code CS in the inter-channelrelationship information code CC by the stereo decoding unit 620 of thedecoding device 600 is input to the sound signal purification device1203 of the sixth embodiment, and that the inter-channel relationshipinformation decoding unit 1243 of the sound signal purification device1203 of the sixth embodiment decodes, as step S1243, a code not includedin the stereo code CS in the inter-channel relationship information codeCC to obtain and output the inter-channel relationship information thathas not been input to the sound signal purification device 1203.

In addition, in a case where the code corresponding to a part of theinter-channel relationship information used by each unit of the soundsignal purification device 1203 is not included in the inter-channelrelationship information code CC, the sound signal purification device1203 of the sixth embodiment is only required to also include theinter-channel relationship information estimation unit 1232, so that theinter-channel relationship information estimation unit 1232 alsoperforms step S1232. In this case, the inter-channel relationshipinformation estimation unit 1232 is only required to obtain and outputthe inter-channel relationship information that cannot be obtained bydecoding the inter-channel relationship information code CC in theinter-channel relationship information used by respective units of thesound signal purification device 1203, similarly to step S1232 of thefifth embodiment.

Seventh Embodiment

Similarly to the sound signal purification devices of the first to sixthembodiments, a sound signal purification device of a seventh embodimentalso improves the decoded sound signal of the each channel of the stereoby using a monaural decoded sound signal obtained from a code differentfrom the code from which the decoded sound signal is obtained.Hereinafter, the sound signal purification device of the seventhembodiment will be described with reference to the sound signalpurification devices of the above-described embodiments as appropriateusing an example in a case where the number of channels of the stereo istwo.

As illustrated in FIG. 15 , the sound signal purification device 1301 ofthe seventh embodiment includes an inter-channel relationshipinformation estimation unit 1331, a decoded sound common signalestimation unit 1351, a decoded sound common signal upmixing unit 1361,a monaural decoded sound upmixing unit 1371, a first channelpurification weight estimation unit 1311-1, a first channel signalpurification unit 1321-1, a first channel separation combination weightestimation unit 1381-1, a first channel separation combination unit1391-1, a second channel purification weight estimation unit 1311-2, asecond channel signal purification unit 1321-2, a second channelseparation combination weight estimation unit 1381-2, and a secondchannel separation combination unit 1391-2. The sound signalpurification device 1301 obtains a purified upmixed signal, which is asound signal obtained by improving an upmixed common signal, from theupmixed common signal that is a signal obtained by upmixing the decodedsound common signal that is a signal common to all channels of thedecoded sound of stereo and an upmixed monaural decoded sound signalobtained by upmixing the monaural decoded sound signal for the eachchannel of the stereo, for example, in units of frames having apredetermined time length of 20 ms, to obtain and output a purifieddecoded sound signal, which is a sound signal obtained by improving thedecoded sound signal from the decoded sound signal, the upmixed commonsignal, and the purified upmixed signal. The decoded sound signals ofthe respective channels input in units of frames to the sound signalpurification device 1301 are, for example, the first channel decodedsound signal ^({circumflex over ( )})X₁={^({circumflex over ( )})x₁(1),^({circumflex over ( )})x₁(2), . . . , ^({circumflex over ( )})x₁(T)} ofthe T samples and the second channel decoded sound signal^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)} ofthe T samples obtained by the stereo decoding unit 620 of the decodingdevice 600 described above decoding the b_(S)-bit stereo code CS that isa code different from the monaural code CM without using the informationobtained by decoding the monaural code CM or the monaural code CM. Themonaural decoded sound signal input in units of frames to the soundsignal purification device 1301 is, for example, the monaural decodedsound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} of the T samples obtained by themonaural decoding unit 610 of the decoding device 600 described abovedecoding the b_(M)-bit monaural code CM that is a code different fromthe stereo code CS without using the information obtained by decodingthe stereo code CS or the stereo code CS. The monaural code CM is a codederived from the same sound signal as the sound signal from which thestereo code CS is derived (that is, the first channel input sound signalX₁ and the second channel input sound signal X₂ input to the encodingdevice 500), but is a code different from the code from which the firstchannel decoded sound signal ^({circumflex over ( )})X₁ and the secondchannel decoded sound signal ^({circumflex over ( )})X₂ are obtained(that is, the stereo code CS). Assuming that the channel number n of thefirst channel is 1 and the channel number n of the second channel is 2,the sound signal purification device 1301 performs steps S1331, S1351,S1361, and S1371, and steps S1311-n, S1321-n, S1381-n, and S1391-n forthe each channel as illustrated in FIG. 16 for the each frame.

[Inter-Channel Relationship Information Estimation Unit 1331]

At least the first channel decoded sound signal^({circumflex over ( )})X₁ input to the sound signal purification device1301 and the second channel decoded sound signal^({circumflex over ( )})X₂ input to the sound signal purification device1301 are input to the inter-channel relationship information estimationunit 1331. The inter-channel relationship information estimation unit1331 obtains and outputs the inter-channel relationship information byusing at least the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ (step S1331). The inter-channel relationshipinformation is information indicating a relationship between thechannels of the stereo. Examples of the inter-channel relationshipinformation are the inter-channel time difference τ, the inter-channelcorrelation coefficient γ, and the preceding channel information. Theinter-channel relationship information estimation unit 1331 may obtain aplurality of types of the inter-channel relationship information and,for example, may obtain the inter-channel time difference τ, theinter-channel correlation coefficient γ, and the preceding channelinformation. As a method of the inter-channel relationship informationestimation unit 1331 to obtain the inter-channel time difference τ and amethod thereof to obtain the inter-channel correlation coefficient γ,for example, it is only required that the methods described above in thedescription of the inter-channel relationship information estimationunit 1132 of the second embodiment are used. In a case where the decodedsound common signal estimation unit 1351 uses the preceding channelinformation, the inter-channel relationship information estimation unit1331 obtains the preceding channel information. As a method of theinter-channel relationship information estimation unit 1331 to obtainthe preceding channel information, for example, it is only required thatthe method described above in the description of the inter-channelrelationship information estimation unit 1231 of the fourth embodimentis used. Note that the inter-channel time difference τ obtained by themethod described above in the description of the inter-channelrelationship information estimation unit 1132 includes the informationindicating the number of samples |τ| corresponding to the timedifference between the first channel and the second channel and theinformation indicating which channel of the first channel and the secondchannel is preceding, and thus, in a case where the inter-channelrelationship information estimation unit 1331 also obtains and outputsthe preceding channel information, information indicating the number ofsamples |τ| corresponding to the time difference between the firstchannel and the second channel may be obtained and output instead of theinter-channel time difference τ.

[Decoded Sound Common Signal Estimation Unit 1351]

At least the first channel decoded sound signal^({circumflex over ( )})X₁={x₁(1), ^({circumflex over ( )})x₁(2), . . ., ^({circumflex over ( )})x₁(T)} and the second channel decoded soundsignal ^({circumflex over ( )})X₂={^({circumflex over ( )})x₂(1),^({circumflex over ( )})x₂(2), . . . , ^({circumflex over ( )})x₂(T)}input to the sound signal purification device 1301 are input to thedecoded sound common signal estimation unit 1351. The decoded soundcommon signal estimation unit 1351 obtains and outputs the decoded soundcommon signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} by using at least the first channeldecoded sound signal ^({circumflex over ( )})X₁ and the second channeldecoded sound signal ^({circumflex over ( )})X₂ (step S1351). As amethod of the decoded sound common signal estimation unit 1351 to obtainthe decoded sound common signal ^({circumflex over ( )})Y_(M), forexample, it is only required that the method described above in thedescription of the decoded sound common signal estimation unit 1251 ofthe fourth embodiment is used.

[Decoded Sound Common Signal Upmixing Unit 1361]

At least the decoded sound common component signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} output by the decoded sound commonsignal estimation unit 1351 and the inter-channel relationshipinformation output by the inter-channel relationship informationestimation unit 1331 are input to the decoded sound common signalupmixing unit 1361. The decoded sound common signal upmixing unit 1361performs the upmixing process using at least the decoded sound commonsignal ^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(M)(1),^({circumflex over ( )})y_(M)(2), . . . ,^({circumflex over ( )})y_(M)(T)} and the inter-channel relationshipinformation, to thereby obtain and output an n-th channel upmixed commonsignal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} that is a signal obtained by upmixingthe decoded sound common signal for the each channel (step S1361). Thedecoded sound common signal upmixing unit 1361 is only required toperform the same processing as the decoded sound common signal upmixingunit 1262 of the fifth embodiment. That is, it is only required toperform, for example, the first method or the second method describedabove in the description of the decoded sound common signal upmixingunit 1262 of the fifth embodiment. Note that, in a case where thedecoded sound common signal upmixing unit 1262 performs the secondmethod, the first channel decoded sound signal input to the sound signalpurification device 1301 and the second channel decoded sound signalinput to the sound signal purification device 1301 are also input to thedecoded sound common signal upmixing unit 1361 as indicated by brokenlines in FIG. 15 .

[Monaural Decoded Sound Upmixing Unit 1371]

The monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} input to the sound signal purificationdevice 1301 and the inter-channel relationship information output by theinter-channel relationship information estimation unit 1331 are input tothe monaural decoded sound upmixing unit 1371. The monaural decodedsound upmixing unit 1371 performs the upmixing process using themonaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(M)(T)} and the inter-channel relationshipinformation, to thereby obtain and output the n-th channel upmixedmonaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} that is a signal obtained by upmixingthe monaural decoded sound signal for the each channel (step S1371). Themonaural decoded sound upmixing unit 1371 is only required to performthe same processing as the monaural decoded sound upmixing unit 1172 ofthe second embodiment.

[n-Th Channel Purification Weight Estimation Unit 1311-n]

The n-th channel purification weight estimation unit 1311-n obtains andoutputs the n-th channel purification weight α_(Mn) (step 1311-n). Then-th channel purification weight estimation unit 1311-n obtains the n-thchannel purification weight α_(Mn) by a method similar to the methodbased on the principle of minimizing the quantization error described inthe first embodiment. The n-th channel purification weight α_(Mn)obtained by the n-th channel purification weight estimation unit 1311-nis a value of 0 or more and 1 or less. However, since the n-th channelpurification weight estimation unit 1311-n obtains the n-th channelpurification weight α_(Mn) for the each frame by the method to bedescribed later, the n-th channel purification weight α_(Mn) does notbecome zero or one in all the frames. That is, there is a frame in whichthe n-th channel purification weight α_(Mn) is a value larger than 0 andsmaller than 1. In other words, in at least any one of all the frames,the n-th channel purification weight α_(Mn) is a value larger than 0 andsmaller than 1.

Specifically, as in the following first to seventh examples, the n-thchannel purification weight estimation unit 1311-n obtains the n-thchannel purification weight α_(Mn) by using the n-th channel upmixedcommon signal ^({circumflex over ( )})Y_(Mn) instead of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) at a position wherethe n-th channel decoded sound signal ^({circumflex over ( )})X_(n) isused in the method based on the principle of minimizing the quantizationerror described in the first embodiment, by using the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn)instead of the monaural decoded sound signal^({circumflex over ( )})X_(M) at a position where the monaural decodedsound signal ^({circumflex over ( )})X_(M) is used in the method basedon the principle of minimizing the quantization error described in thefirst embodiment, and by using the number of bits b_(m) corresponding tothe common signal in the number of bits of the stereo code CS instead ofthe number of bits b_(n) at a position where the number of bits b_(n)corresponding to the n-th channel in the number of bits of the stereocode CS is used in the method based on the principle of minimizing thequantization error described in the first embodiment. That is, in thefollowing first to seventh examples, the number of bits b_(M) of themonaural code CM and the number of bits b_(m) corresponding to thecommon signal in the number of bits of the stereo code CS are used. Amethod for specifying the number of bits b_(M) of the monaural code CMis the same as that in the first embodiment, and a method for specifyingthe number of bits b_(m) corresponding to the common signal in thenumber of bits of the stereo code CS is the same as that in the fourthembodiment. The n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} output by the decoded sound commonsignal upmixing unit 1361 and the n-th channel upmixed monaural decodedsound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(M)(1),^({circumflex over ( )})x_(M)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} output by the monaural decoded soundupmixing unit 1371 are input to the n-th channel purification weightestimation unit 1311-n as necessary as indicated by one-dot chain linesin FIG. 15 .

First Example

The n-th channel purification weight estimation unit 1311-n of the firstexample obtains the n-th channel purification weight α_(Mn) by thefollowing Expression (7-5) using the number of samples T per frame, thenumber of bits b_(m) corresponding to the common signal in the number ofbits of the stereo code CS, and the number of bits b_(M) of the monauralcode CM.

$\begin{matrix}\lbrack {{Math}.37} \rbrack &  \\{\alpha_{Mn} = \frac{2^{- \frac{2b_{m}}{T}}}{2^{- \frac{2b_{m}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & ( {7 - 5} )\end{matrix}$

Note that, since the n-th channel purification weight α_(Mn) obtained inthe first example has the same value in all the channels, the soundsignal purification device 1301 may include the purification weightestimation unit 1311 common to all the channels instead of the n-thchannel purification weight estimation unit 1311-n of the each channel,and the purification weight estimation unit 1311 may obtain the n-thchannel purification weight α_(Mn) common to all the channels byExpression (7-5).

Second Example

The n-th channel purification weight estimation unit 1311-n of thesecond example uses at least the number of bits b_(m) corresponding tothe common signal in the number of bits of the stereo code CS and thenumber of bits b_(M) of the monaural code CM to obtain a value that islarger than 0 and smaller than 1, 0.5 when b_(m) and b_(M) are equal,closer to 0 than 0.5 as b_(m) is larger than b_(M), and closer to 1 than0.5 as b_(M) is larger than b_(m) as the n-th channel purificationweight α_(Mn). Note that, since the n-th channel purification weightα_(Mn) obtained in the second example may have the same value in all thechannels, the sound signal purification device 1301 may include thepurification weight estimation unit 1311 common to all the channelsinstead of the n-th channel purification weight estimation unit 1311-nof the each channel, and the purification weight estimation unit 1311may obtain the n-th channel purification weight α_(Mn) common to all thechannels satisfying the above-described conditions.

Third Example

The n-th channel purification weight estimation unit 1311-n of the thirdexample obtains the value c_(n)×r_(n) obtained by multiplying thecorrection coefficient c_(n) obtained by

$\begin{matrix}\lbrack {{Math}.38} \rbrack &  \\{c_{n} = \frac{2^{- \frac{2b_{m}}{T}}}{2^{- \frac{2b_{m}}{T}} + 2^{- \frac{2b_{M}}{T}}}} & ( {7 - 8} )\end{matrix}$

using the number of samples T per frame, the number of bits b_(m)corresponding to the common signal in the number of bits of the stereocode CS, and the number of bits b_(M) of the monaural code CM by thenormalized inner product value r_(n) for the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) of the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn), as then-th channel purification weight α_(Mn).

The n-th channel purification weight estimation unit 1311-n of the thirdexample obtains the n-th channel purification weight α_(Mn) byperforming, for example, the following steps S1311-31-n to S1311-33-n.The n-th channel purification weight estimation unit 1311-n firstobtains a normalized inner product value r_(n) for the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe n-th channel upmixed common signal ^({circumflex over ( )})Y_(Mn) bythe following Expression (7-6) from the n-th channel upmixed commonsignal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} and the n-th channel upmixed monauraldecoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), ^({circumflex over ( )})x_(Mn)(T)}(step S1311-31-n).

$\begin{matrix}\lbrack {{Math}.39} \rbrack &  \\{r_{n} = \frac{{\sum}_{t = 1}^{T}{{\hat{y}}_{Mn}(t)}{{\hat{x}}_{Mn}(t)}}{{\sum}_{t = 1}^{T}{{\hat{x}}_{Mn}(t)}{{\hat{x}}_{Mn}(t)}}} & ( {7 - 6} )\end{matrix}$

The n-th channel purification weight estimation unit 1311-n also obtainsthe correction coefficient c_(n) by Expression (7-8) using the number ofsamples T per frame, the number of bits b_(m) corresponding to thecommon signal in the number of bits of the stereo code CS, and thenumber of bits b_(M) of the monaural code CM (step S1311-32-n). Next,the n-th channel purification weight estimation unit 1311-n obtains avalue c_(n)×r_(n) obtained by multiplying the normalized inner productvalue r_(n) obtained in step S1311-31-n by the correction coefficientc_(n) obtained in step S1311-32-n as the n-th channel purificationweight α_(Mn) (step S1311-33-n).

Fourth Example

The n-th channel purification weight estimation unit 1311-n of thefourth example uses the number of bits corresponding to the commonsignal in the number of bits of the stereo code CS as b_(m) and thenumber of bits of the monaural code CM as b_(M) to obtain a valuec_(n)×r_(n) obtained by multiplying r_(n) that is a value of 0 or moreand 1 or less, closer to 1 as the correlation between the n-th channelupmixed common signal ^({circumflex over ( )})Y_(Mn) and the n-thchannel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) is higher, and closer to 0 as thecorrelation is lower by the correction coefficient c_(n) that is a valuelarger than 0 and smaller than 1, 0.5 when b_(m) and b_(M) are equal,closer to 0 than 0.5 as b_(m) is larger than b_(M), and closer to 1 than0.5 as b_(m) is smaller than b_(M), as the n-th channel purificationweight α_(Mn).

Fifth Example

The n-th channel purification weight estimation unit 1311-n of the fifthexample obtains the n-th channel purification weight α_(Mn) byperforming the following steps S1311-51-n to S1311-55-n.

The n-th channel purification weight estimation unit 1311-n firstobtains the inner product value E_(n)(0) to be used in the current frameby the following Expression (7-9) using the n-th channel upmixed commonsignal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)}, the n-th channel upmixed monauraldecoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)}, and the inner product valueE_(n)(−1) that has been used in the previous frame (step S1311-51-n).

$\begin{matrix}\lbrack {{Math}.40} \rbrack &  \\{{E_{n}(0)} = {{\epsilon_{n}{E_{n}( {- 1} )}} + {\frac{( {1 - \epsilon_{n}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{y}}_{Mn}(t)}{{\hat{x}}_{Mn}(t)}}}}}} & ( {7 - 9} )\end{matrix}$

Here, ε_(n) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the n-th channel purification weightestimation unit 1311-n. Note that the n-th channel purification weightestimation unit 1311-n stores the obtained inner product value E_(n)(0)in the n-th channel purification weight estimation unit 1311-n in orderto use this inner product value E_(n)(0) as the “inner product valueEn(−1) that has been used in the previous frame” in the next frame.

The n-th channel purification weight estimation unit 1311-n also obtainsthe energy E_(Mn)(0) of the n-th channel upmixed monaural decoded soundsignal to be used in the current frame by the following Expression(7-10) using the n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} and the energy E_(Mn)(−1) of the n-thchannel upmixed monaural decoded sound signal that has been used in theprevious frame (step S1311-52-n).

$\begin{matrix}\lbrack {{Math}.41} \rbrack &  \\{{E_{Mn}(0)} = {{\epsilon_{Mn}{E_{Mn}( {- 1} )}} + {\frac{( {1 - \epsilon_{Mn}} )}{T}{\sum\limits_{t = 1}^{T}{{{\hat{x}}_{Mn}(t)}{{\hat{x}}_{Mn}(t)}}}}}} & ( {7 - 10} )\end{matrix}$

Here, ε_(Mn) is a predetermined value larger than 0 and smaller than 1,and is stored in advance in the n-th channel purification weightestimation unit 1311-n. Note that the n-th channel purification weightestimation unit 1311-n stores the energy E_(Mn)(0) of the obtained n-thchannel upmixed monaural decoded sound signal in the n-th channelpurification weight estimation unit 1311-n in order to use this energyE_(Mn)(0) as the “energy EMn(−1) of the n-th channel upmixed monauraldecoded sound signal that has been used in the previous frame” in thenext frame.

Next, the n-th channel purification weight estimation unit 1311-nobtains the normalized inner product value r_(n) by the followingExpression (7-11) using the inner product value E_(n)(0) to be used inthe current frame obtained in step S1311-51-n and the energy E_(Mn)(0)of the n-th channel upmixed monaural decoded sound signal used in thecurrent frame obtained in step S1311-52-n (step S1311-53-n).

[Math. 42]

r _(n) =E _(n)(0)/E _(Mn)(0)  (7-11)

The n-th channel purification weight estimation unit 1311-n also obtainsthe correction coefficient c_(n) by Expression (7-8) (step S1311-54-n).Next, the n-th channel purification weight estimation unit 1311-nobtains the value c_(n)×r_(n) obtained by multiplying the normalizedinner product value r_(n) obtained in step S1311-53-n and the correctioncoefficient c_(n) obtained in step S1311-54-n as the n-th channelpurification weight α_(Mn) (step S1311-55-n).

That is, the n-th channel purification weight estimation unit 1311-n ofthe fifth example obtains the value c_(n)×r_(n) obtained by multiplyingthe normalized inner product value r_(n) obtained by Expression (7-11)using the inner product value E_(n)(0) obtained by Expression (7-9)using each sample value ^({circumflex over ( )})y_(Mn)(t) of the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn), eachsample value ^({circumflex over ( )})x_(Mn)(t) of the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn),and an inner product value E_(n)(−1) of the previous frame, and theenergy E_(Mn)(0) of the n-th channel upmixed monaural decoded soundsignal obtained by Expression (7-10) using each sample value^({circumflex over ( )})x_(Mn)(t) of the n-th channel upmixed monauraldecoded sound signal ^({circumflex over ( )})X_(Mn) and energyE_(Mn)(−1) of the n-th channel upmixed monaural decoded sound signal ofthe previous frame, by the correction coefficient c_(n) obtained byExpression (7-8) using the number of samples T per frame, the number ofbits b_(M) corresponding to the common signal in the number of bits ofthe stereo code CS, and the number of bits b_(M) of the monaural codeCM, as the n-th channel purification weight α_(Mn).

Sixth Example

The n-th channel purification weight estimation unit 1311-n of the sixthexample obtains a value λ×c_(n)×r_(n) obtained by multiplying thenormalized inner product value r_(n) and the correction coefficientc_(n) described in the third example or the normalized inner productvalue r_(n) and the correction coefficient c_(n) described in the fifthexample by λ that is a predetermined value larger than 0 and smallerthan 1, as the n-th channel purification weight α_(Mn).

Seventh Example

The n-th channel purification weight estimation unit 1311-n of theseventh example obtains a value γ×c_(n)×r_(n) obtained by multiplyingthe normalized inner product value r_(n) and the correction coefficientc_(n) described in the third example or the normalized inner productvalue r_(n) and the correction coefficient c_(n) described in the fifthexample by the inter-channel correlation coefficient γ obtained by theinter-channel relationship information estimation unit 1331, as the n-thchannel purification weight α_(Mn).

[n-Th Channel Signal Purification Unit 1321-n]

The n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} output by the decoded sound commonsignal upmixing unit 1361, the n-th channel upmixed monaural decodedsound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} output by the monaural decoded soundupmixing unit 1371, and the n-th channel purification weight α_(Mn)output by the n-th channel purification weight estimation unit 1311-nare input to the n-th channel signal purification unit 1321-n. For eachcorresponding sample t, the n-th channel signal purification unit 1321-nobtains and outputs a sequence based on a value ^(˜)y_(Mn)(t) obtainedby adding a value α_(Mn)×^({circumflex over ( )})x_(Mn)(t) obtained bymultiplying the n-th channel purification weight α_(Mn) by the samplevalue ^({circumflex over ( )})x_(Mn)(t) of the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) and a value(1−α_(Mn))×^({circumflex over ( )})y_(Mn)(t) obtained by multiplying avalue (1−α_(Mn)) obtained by subtracting the n-th channel purificationweight α_(Mn) from 1 by the sample value^({circumflex over ( )})y_(Mn)(t) of the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn), as the n-th channel purifiedupmixed signal ^(˜)Y_(Mn)={^(˜)y_(Mn)(1), ^(˜)y_(Mn)(2), . . . ,^(˜)y_(Mn)(T)} (step S1321-n). That is,^(˜)y_(Mn)(t)=(1−α_(Mn))×^({circumflex over ( )})y_(Mn)(t)+α_(Mn)×^({circumflex over ( )})x_(Mn)(t).

[n-Th Channel Separation Combination Weight Estimation Unit 1381-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1301 and the n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} output by the decoded sound commonsignal upmixing unit 1361 are input to the n-th channel separationcombination weight estimation unit 1381-n. The n-th channel separationcombination weight estimation unit 1381-n obtains and outputs thenormalized inner product value for the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) from the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) and the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn), as the n-th channel separationcombination weight β_(n) (step S1381-n).

Specifically, the n-th channel separation combination weight β_(n) is asrepresented by Expression (71).

$\begin{matrix}\lbrack {{Math}.43} \rbrack &  \\{\beta_{n} = \frac{{\sum}_{t = 1}^{T}{{\hat{x}}_{n}(t)}{{\hat{y}}_{Mn}(t)}}{{\sum}_{t = 1}^{T}{{\hat{y}}_{Mn}(t)}{{\hat{y}}_{Mn}(t)}}} & (71)\end{matrix}$

[n-Th Channel Separation Combination Unit 1391-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signal purificationdevice 1301, the n-th channel upmixed common signal^({circumflex over ( )})Y_(M)={^({circumflex over ( )})y_(Mn)(1),^({circumflex over ( )})y_(Mn)(2), . . . ,^({circumflex over ( )})y_(Mn)(T)} output by the decoded sound commonsignal upmixing unit 1361, the n-th channel purified upmixed signal^(˜)Y_(n)={^(˜)y_(n)(1), ^(˜)y_(Mn)(2), . . . , ^(˜)y_(Mn)(T)} output bythe n-th channel signal purification unit 1321-n, and the n-th channelseparation combination weight β_(n) output by the n-th channelseparation combination weight estimation unit 1381-n are input to then-th channel separation combination unit 1391-n. For each correspondingsample t, the n-th channel separation combination unit 1391-n obtainsand outputs a sequence based on a value ^(˜)x_(n)(t) obtained bysubtracting a value β_(n)×^({circumflex over ( )})y_(Mn)(t) obtained bymultiplying the n-th channel separation combination weight β_(n) by thesample value ^({circumflex over ( )})y_(Mn)(t) of the n-th channelupmixed common signal ^({circumflex over ( )})Y_(M) from the samplevalue ^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by the sample value ^(˜)y_(n)(t) of the n-thchannel purified upmixed signal ^(˜)Y_(n), as the n-th channel purifieddecoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . ,^(˜)x_(n)(T)}(step S1391-n). That is,^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(Mn)(t)+β_(n)×^(˜)y_(Mn)(t).

Eighth Embodiment

Similarly to the sound signal purification device of the seventhembodiment, a sound signal purification device of an eighth embodimentalso improves the decoded sound signal of the each channel of the stereoby using a monaural decoded sound signal obtained from a code differentfrom the code from which the decoded sound signal is obtained. The soundsignal purification device of the eighth embodiment is different fromthe sound signal purification device of the seventh embodiment in thatthe inter-channel relationship information is obtained not from adecoded sound signal but from a code. Hereinafter, regarding the soundsignal purification device of the eighth embodiment, differences fromthe sound signal purification device of the seventh embodiment will bedescribed using an example in a case where the number of channels of thestereo is two.

<<Sound Signal Purification Device 1302>>

As illustrated in FIG. 17 , the sound signal purification device 1302 ofthe eighth embodiment includes an inter-channel relationship informationdecoding unit 1342, the decoded sound common signal estimation unit1351, the decoded sound common signal upmixing unit 1361, the monauraldecoded sound upmixing unit 1371, the first channel purification weightestimation unit 1311-1, the first channel signal purification unit1321-1, the first channel separation combination weight estimation unit1381-1, the first channel separation combination unit 1391-1, the secondchannel purification weight estimation unit 1311-2, the second channelsignal purification unit 1321-2, the second channel separationcombination weight estimation unit 1381-2, and the second channelseparation combination unit 1391-2. For the each frame, as illustratedin FIG. 18 , the sound signal purification device 1302 performs stepsS1342, S1351, S1361, and S1371, and steps S1311-n, S1321-n, S1381-n, andS1391-n for the each channel. The sound signal purification device 1302of the eighth embodiment is different from the sound signal purificationdevice 1301 of the seventh embodiment in that the inter-channelrelationship information decoding unit 1342 is provided instead of theinter-channel relationship information estimation unit 1331, and stepS1342 is performed instead of step S1331. Further, the inter-channelrelationship information code CC of the each frame is also input to thesound signal purification device 1302 of the eighth embodiment. Theinter-channel relationship information code CC may be a code obtainedand output by the inter-channel relationship information encoding unit,which is not illustrated, included in the above-described encodingdevice 500, or may be a code included in the stereo code CS obtained andoutput by the stereo encoding unit 530 of the above-described encodingdevice 500.

Hereinafter, differences between the sound signal purification device1302 of the eighth embodiment and the sound signal purification device1301 of the seventh embodiment will be described.

[Inter-Channel Relationship Information Decoding Unit 1342]

The inter-channel relationship information code CC input to the soundsignal purification device 1302 is input to the inter-channelrelationship information decoding unit 1342. The inter-channelrelationship information decoding unit 1342 decodes the inter-channelrelationship information code CC to obtain and output the inter-channelrelationship information (step S1342). The inter-channel relationshipinformation obtained by the inter-channel relationship informationdecoding unit 1342 is the same as the inter-channel relationshipinformation obtained by the inter-channel relationship informationestimation unit 1331 of the seventh embodiment.

Modification Example of Eighth Embodiment

In a case where the inter-channel relationship information code CC is acode included in the stereo code CS, the same inter-channel relationshipinformation obtained in step S1342 is obtained by decoding in the stereodecoding unit 620 of the decoding device 600.

Therefore, in a case where the inter-channel relationship informationcode CC is a code included in the stereo code CS, the inter-channelrelationship information obtained by the stereo decoding unit 620 of thedecoding device 600 may be input to the sound signal purification device1302 of the eighth embodiment, and the sound signal purification device1302 of the eighth embodiment may not include the inter-channelrelationship information decoding unit 1342 and does not perform stepS1342.

Further, in a case where only a part of the inter-channel relationshipinformation code CC is a code included in the stereo code CS, it is onlyrequired that the inter-channel relationship information obtained bydecoding the code included in the stereo code CS in the inter-channelrelationship information code CC by the stereo decoding unit 620 of thedecoding device 600 is input to the sound signal purification device1302 of the eighth embodiment, and that the inter-channel relationshipinformation decoding unit 1342 of the sound signal purification device1302 of the eighth embodiment decodes, as step S1342, a code notincluded in the stereo code CS in the inter-channel relationshipinformation code CC to obtain and output the inter-channel relationshipinformation that has not been input to the sound signal purificationdevice 1302.

Further, in a case where the code corresponding to a part of theinter-channel relationship information used by each unit of the soundsignal purification device 1302 is not included in the inter-channelrelationship information code CC, the sound signal purification device1302 of the eighth embodiment is only required to also include theinter-channel relationship information estimation unit 1331, so that theinter-channel relationship information estimation unit 1331 alsoperforms step S1331. In this case, as step S1331, the inter-channelrelationship information estimation unit 1331 is only required to obtainand output the inter-channel relationship information that cannot beobtained by decoding the inter-channel relationship information code CCamong pieces of the inter-channel relationship information used byrespective units of the sound signal purification device 1302, similarlyto step S1331 of the seventh embodiment.

Ninth Embodiment

In the decoded sound signal obtained by encoding/decoding the inputsound signal, a phase of a high-frequency component rotates with respectto the input sound signal due to distortion caused by encodingprocessing. Since the encoding/decoding method for obtaining themonaural decoded sound signal and the encoding/decoding method forobtaining the decoded sound signal of the each channel of the stereo aredifferent encoding/decoding methods independent from each other,high-frequency components of the monaural decoded sound signal obtainedby the monaural decoding unit 610 and the decoded sound signal of theeach channel of the stereo obtained by the stereo decoding unit 620 havea small correlation and the energy of the high-frequency components maybe reduced by the weighted addition process (hereinafter referred to as“signal purification processing in the time domain” for convenience) inthe time domain in the signal purification unit of the sound signalpurification device described above or the separation combination unitof the each channel, and thus the purified decoded sound signal of theeach channel may be heard like being muffled. A sound signalhigh-frequency compensation device of a ninth embodiment eliminates thismuffling by compensating for high-frequency energy using thehigh-frequency component of a signal before the signal purificationprocessing.

Note that a case where the sound signal is heard like being muffled dueto the reduction in energy of the high-frequency component is notlimited to the purified decoded sound signal obtained by performing thesignal purification processing in the time domain by the sound signalpurification device described above on the decoded sound signal of theeach channel, and a sound signal obtained by performing the signalprocessing in the time domain other than the signal purificationprocessing by the sound signal purification device described above onthe decoded sound signal of the each channel may also be heard likebeing muffled. The sound signal high-frequency compensation device ofthe ninth embodiment can eliminate the muffling by compensating forhigh-frequency energy using a high-frequency component of a signalbefore signal processing in the time domain regardless of whether or notit is the signal purification processing in the time domain by the soundsignal purification device described above.

Hereinafter, not only the purified decoded sound signal obtained byperforming the signal purification processing by the sound signalpurification device described above on the decoded sound signal of theeach channel, but also the sound signal obtained by performing thesignal processing in the time domain on the decoded sound signal of theeach channel is also referred to as a purified decoded sound signal forconvenience, and the sound signal high-frequency compensation device ofthe ninth embodiment will be described using an example in a case wherethe number of channels of the stereo is two.

<<Sound Signal High-Frequency Compensation Device 201>>

As illustrated in FIG. 19 , a sound signal high-frequency compensationdevice 201 of the ninth embodiment includes a first channelhigh-frequency compensation gain estimation unit 211-1, a first channelhigh-frequency compensation unit 221-1, a second channel high-frequencycompensation gain estimation unit 211-2, and a second channelhigh-frequency compensation unit 221-2. The first channel purifieddecoded sound signal ^(˜)X₁ and the second channel purified decodedsound signal ^(˜)X₂ output by any of the sound signal purificationdevices described above and the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ output by the stereo decoding unit 620 of thedecoding device 600 are input to the sound signal high-frequencycompensation device 201. The sound signal high-frequency compensationdevice 201 obtains and outputs, for the each channel of the stereo inunits of frames having a predetermined time length of 20 ms, forexample, a compensated decoded sound signal of the channel, which is asound signal obtained by compensating the high-frequency energy of thepurified decoded sound signal of the channel, by using the purifieddecoded sound signal of the channel and the decoded sound signal of thechannel. Assuming that the channel number n (channel index n) of thefirst channel is 1 and the channel number n of the second channel is 2,the sound signal high-frequency compensation device 201 performs stepsS211-n and S221-n illustrated in FIG. 20 for the each channel for theeach frame. Note that the high frequency mentioned here means a bandthat is not a low frequency band (what is called a “low frequency”) inwhich a phase is maintained to some extent even by encoding processing.The high frequency, even if the phases of the input sound signal and thedecoded sound signal are different from each other, has a difference inaudibility that is hard to be perceived, and thus the phase of thecomponent of approximately 2 kHz or more is often rotated by theencoding processing.

Therefore, the sound signal high-frequency compensation device 201 isonly required to handle, for example, a component having a frequency ofapproximately 2 kHz or more as the high frequency. However, it is notessential that approximately 2 kHz or more are the high frequency, andthe sound signal high-frequency compensation device 201 is only requiredto handle, as the high frequency, a component equal to or higher than apredetermined frequency that divides a frequency band having apossibility of being included in each signal into two. This similarlyapplies to the following embodiments and modification examples. Notethat the first channel purified decoded sound signal ^(˜)X₁ and thesecond channel purified decoded sound signal ^(˜)X₂ input to the soundsignal high-frequency compensation device 201 are not necessarilysignals output by any of the sound signal purification devices describedabove, and are only required to be the first channel purified decodedsound signal ^(˜)X₁ and the second channel purified decoded sound signal^(˜)X₂ which are sound signals obtained by performing the signalprocessing in the time domain on the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ output by the stereo decoding unit 620 of thedecoding device 600. This also similarly applies to the followingembodiments and modification examples.

[n-Th Channel High-Frequency Compensation Gain Estimation Unit 211-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2) . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signalhigh-frequency compensation device 201 and the n-th channel purifieddecoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . ,^(˜)x_(n)(T)} input to the sound signal high-frequency compensationdevice 201 are input to the n-th channel high-frequency compensationgain estimation unit 211-n. The n-th channel high-frequency compensationgain estimation unit 211-n obtains and outputs an n-th channelhigh-frequency compensation gain ρ_(n) from the n-th channel decodedsound signal ^({circumflex over ( )})X_(n) and the n-th channel purifieddecoded sound signal ^(˜)X_(n) (step S211-n). The n-th channelhigh-frequency compensation gain ρ_(n) is a value for bringinghigh-frequency energy of an n-th channel compensated decoded soundsignal ^(˜)X′_(n) obtained by the n-th channel high-frequencycompensation unit 221-n described later close to high-frequency energyof the n-th channel decoded sound signal ^({circumflex over ( )})X_(n).A method by which the n-th channel high-frequency compensation gainestimation unit 211-n obtains the n-th channel high-frequencycompensation gain ρ_(n) will be described later.

[n-Th Channel High-Frequency Compensation Unit 221-n]

The n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the signal high-frequencycompensation device 201, the n-th channel purified decoded sound signal^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} input tothe sound signal high-frequency compensation device 201, and the n-thchannel high-frequency compensation gain ρ_(n) output by the n-thchannel high-frequency compensation gain estimation unit 211-n are inputto the n-th channel high-frequency compensation unit 221-n. The n-thchannel high-frequency compensation unit 221-n obtains and outputs asignal obtained by adding the n-th channel purified decoded sound signal^(˜)X_(n) and a signal obtained by multiplying the high-frequencycomponent of the n-th channel decoded sound signal^({circumflex over ( )})X_(n) by the n-th channel high-frequencycompensation gain ρ_(n), as the n-th channel compensated decoded soundsignal ^(˜)X′_(n)={^(˜)x′_(n)(1), ^(˜)x′_(n)(2), . . . , ^(˜)x′_(n)(T)}(step S221-n).

For example, the n-th channel high-frequency compensation unit 221-npasses the n-th channel decoded sound signal^({circumflex over ( )})X_(n) through a high-pass filter to obtain ann-th channel compensation signal^({circumflex over ( )})X′_(n)={^({circumflex over ( )})x′_(n)(1),^({circumflex over ( )})x′_(n)(2), . . . ,^({circumflex over ( )})x′_(n)(T)} and, for each corresponding sample t,obtains and outputs a sequence based on a value ^(˜)x′_(n)(t) obtainedby adding a sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t) obtained bymultiplying the n-th channel high-frequency compensation gain ρ_(n) by asample value ^({circumflex over ( )})x′_(n)(t) of the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) as the n-th channelcompensated decoded sound signal ^(˜)X′_(n)={^(˜)x′_(n)(1),^(˜)x′_(n)(2), . . . , ^(˜)x′_(n)(T)}. That is,^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t). Asthe high-pass filter, it is only required that a high-pass filter havinga passband equal to or higher than a predetermined frequency thatdivides a frequency band having a possibility of being included in eachsignal into two is, and for example, in a case where a component havinga frequency of 2 kHz or higher is handled as the high frequency, it isonly required that a high-pass filter having a passband of 2 kHz orhigher is used.

[Method by which n-Th Channel High-Frequency Compensation GainEstimation Unit 211-n Obtains n-Th Channel High-Frequency CompensationGain ρ_(n)]

The n-th channel high-frequency compensation gain estimation unit 211-nobtains the n-th channel high-frequency compensation gain ρ_(n) by, forexample, the following first method or second method.

[[First Method for Obtaining n-Th Channel High-Frequency CompensationGain ρ_(n)]]

In the first method, the n-th channel high-frequency compensation gainestimation unit 211-n obtains the n-th channel high-frequencycompensation gain ρ_(n) having a larger value as the high-frequencyenergy of the n-th channel purified decoded sound signal ^(˜)X_(n) issmaller than the high-frequency energy of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n). For example, the n-th channelhigh-frequency compensation gain estimation unit 211-n obtains a squareroot of a value (1−^(˜)EX_(n)/^({circumflex over ( )})EX_(n)) obtainedby subtracting a value obtained by dividing high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)by high-frequency energy ^({circumflex over ( )})EX_(n) of the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) from 1 as then-th channel high-frequency compensation gain ρ_(n). That is, the n-thchannel high-frequency compensation gain estimation unit 211-n obtainsthe n-th channel high-frequency compensation gain ρ_(n) by the followingExpression (91) using the high-frequency energy ^(˜)EX_(n) of the n-thchannel purified decoded sound signal ^(˜)X_(n) and the high-frequencyenergy ^({circumflex over ( )})EX_(n) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n).

[ Math . 44 ]  ρ n = 1 - n n ( 91 )

[[Second Method for Obtaining n-Th Channel High-Frequency CompensationGain ρ_(n)]]

When the signal is passed through the high-pass filter, the phase ofeach frequency component of the signal rotates. Accordingly, even if thephases of the high-frequency components do not match between the n-thchannel compensation signal ^({circumflex over ( )})X′_(n) and the n-thchannel purified decoded sound signal ^(˜)X_(n), and the n-th channelhigh-frequency compensation unit 221-n adds^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t) foreach sample t using the n-th channel high-frequency compensation gainρ_(n) obtained by the first method to obtain the n-th channelcompensated decoded sound signal ^(˜)X′_(n), there is a possibility thatthe high-frequency component of the n-th channel compensation signal^({circumflex over ( )})X′_(n) and the high-frequency component of then-th channel purified decoded sound signal ^(˜)X_(n) cancel each other,and thus the high-frequency energy of the n-th channel compensateddecoded sound signal ^(˜)X′_(n) does not approach the high-frequencyenergy of the n-th channel decoded sound signal^({circumflex over ( )})X_(n) as expected. Therefore, even if thehigh-frequency components cancel each other out by the above-describedaddition, the second method can bring the high-frequency energy of then-th channel compensated decoded sound signal ^(˜)X′_(n) close to thehigh-frequency energy of the n-th channel decoded sound signal^({circumflex over ( )})X_(n). In the second method, the n-th channelhigh-frequency compensation gain estimation unit 211-n obtains the n-thchannel high-frequency compensation gain ρ_(n), for example, byperforming the following steps S211-21-n to S211-23-n.

The n-th channel high-frequency compensation gain estimation unit 211-nfirst passes the n-th channel decoded sound signal^({circumflex over ( )})X_(n) through a high-pass filter having the samecharacteristics as that used by the n-th channel high-frequencycompensation unit 221-n to obtain the n-th channel compensation signal^({circumflex over ( )})X′_(n)={^({circumflex over ( )})x′_(n)(1)^({circumflex over ( )})x′_(n)(2), . . . ,^({circumflex over ( )})x′_(n)(T)} (step S211-21-n). Next, the n-thchannel high-frequency compensation gain estimation unit 211-n obtains,for each corresponding sample t, a sequence based on a value^(˜)x″_(n)(t) obtained by adding the sample value ^(˜)x_(n)(t) of then-th channel purified decoded sound signal ^(˜)X_(n) and the samplevalue ^({circumflex over ( )})x′_(n)(t) of the n-th channel compensationsignal ^({circumflex over ( )})X′_(n) as an n-th channel temporaryaddition signal ^(˜)X″_(n)={^(˜)x″_(n)(1), ^(˜)x″_(n)(2), . . . ,^(˜)x″_(n)(T)} (step S211-22-n). That is,^(˜)x″_(n)(t)=^(˜)x_(n)(t)+^({circumflex over ( )})x′_(n)(t). Next, then-th channel high-frequency compensation gain estimation unit 211-nobtains the n-th channel high-frequency compensation gain ρ_(n) (stepS211-23-n) that is a value larger as the high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)is smaller than the high-frequency energy ^({circumflex over ( )})EX_(n)of the n-th channel decoded sound signal ^({circumflex over ( )})X_(n),and is a value larger as a difference between the high-frequency energyof the n-th channel purified decoded sound signal ^(˜)X_(n) and thehigh-frequency energy of the n-th channel temporary addition signal^(˜)X″_(n) is smaller than the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n). For example, the n-th channelhigh-frequency compensation gain estimation unit 211-n obtains the n-thchannel high-frequency compensation gain ρ_(n) by the followingExpression (92) using the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n), the high-frequency energy ^(˜)EX_(n) ofthe n-th channel purified decoded sound signal ^(˜)X_(n), and a value(^(˜)EX″_(n)−^(˜)EX_(n)) obtained by subtracting the high-frequencyenergy ^(˜)EX_(n) of the n-th channel purified decoded sound signal^(˜)X_(n) from the high-frequency energy ^(˜)EX″_(n) of the n-th channeltemporary addition signal ^(˜)X″_(n).

[Math. 45]

ρ_(n)=√{square root over ({circumflex over (ρ)}_(n) ²+0.25μ_(n)²)}+0.5μ_(n)  (92)

Here, ^({circumflex over ( )})ρ_(n) ² is a value obtained by thefollowing Expression (92a), and μ_(n) is a value obtained by thefollowing Expression (92b).

[ Math . 46 ]  ρ n 2 = 1 - n n ( 92 ⁢ a ) [ Math . 47 ]  μ n = 1 - n n( 92 ⁢ b )

If the high-frequency component of the n-th channel compensation signal^({circumflex over ( )})X′_(n) and the high-frequency component of then-th channel purified decoded sound signal ^(˜)X_(n) do not cancel eachother out of energy by addition, a value (^(˜)EX″_(n)−^(˜)EX_(n))obtained by subtracting the high-frequency energy ^(˜)EX_(n) of the n-thchannel purified decoded sound signal ^(˜)X_(n) from the high-frequencyenergy ^(˜)EX″_(n) of the n-th channel temporary addition signal^(˜)X″_(n) becomes equal to the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n), and thus P_(n) becomes zero and the n-thchannel high-frequency compensation gain ρ_(n) obtained by Expression(92) becomes equal to the n-th channel high-frequency compensation gainρ_(n) obtained by Expression (91) of [[First Method for Obtaining n-thChannel High-Frequency Compensation Gain ρ_(n)]]. Further, as thehigh-frequency component of the n-th channel compensation signal^({circumflex over ( )})X′_(n) and the high-frequency component of then-th channel purified decoded sound signal ^(˜)X_(n) cancel each otherout of energy by addition, μ_(n) becomes a value larger than zero, andthe n-th channel high-frequency compensation gain ρ_(n) obtained byExpression (92) becomes a value larger than the n-th channelhigh-frequency compensation gain ρ_(n) obtained by Expression (91) of[[First Method for Obtaining n-th Channel High-Frequency CompensationGain ρ_(n)]]. Therefore, since it is assumed that some cancellation ofenergy occurs due to the addition of the high-frequency component of then-th channel compensation signal ^({circumflex over ( )})X′_(n) and thehigh-frequency component of the n-th channel purified decoded soundsignal ^(˜)X_(n), it can be said that in the second method, the n-thchannel high-frequency compensation gain estimation unit 211-n obtains avalue larger than the value obtained by Expression (91) as the n-thchannel high-frequency compensation gain ρ_(n).

Note that the n-th channel high-frequency compensation gain estimationunit 211-n may obtain the n-th channel high-frequency compensation gainρ_(n) by the following Expression (93) or the following Expression (94)instead of Expression (92). A in Expression (94) is a predeterminedpositive value, and is desirably a value near one.

[Math. 48]

ρ_(n)=√{square root over ({circumflex over (ρ)}_(n) ²)}+μ_(n)  (93)

[Math. 49]

ρ_(n)=√{square root over ({circumflex over (ρ)}_(n) ²)}+Aμ _(n)  (94)

In the example of the second method described above, the n-th channelhigh-frequency compensation gain estimation unit 211-n obtains, in stepS211-21-n, the same n-th channel compensation signal^({circumflex over ( )})X′_(n) used by the n-th channel high-frequencycompensation unit 221-n. Therefore, the n-th channel high-frequencycompensation gain estimation unit 211-n may output the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) obtained in stepS211-21-n, and the n-th channel compensation signal^({circumflex over ( )})X′_(n) output by the n-th channel high-frequencycompensation gain estimation unit 211-n may be input to the n-th channelhigh-frequency compensation unit 221-n instead of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) input to the signalhigh-frequency compensation device 201. In this case, the n-th channelhigh-frequency compensation unit 221-n does not need to perform thehigh-pass filter processing for obtaining the n-th channel compensationsignal ^({circumflex over ( )})X′_(n). Conversely, the n-th channelhigh-frequency compensation unit 221-n may output the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) obtained by thehigh-pass filter processing, and the n-th channel compensation signal^({circumflex over ( )})X′_(n) output by the n-th channel high-frequencycompensation unit 221-n may be input to the n-th channel high-frequencycompensation gain estimation unit 211-n. In this case, the n-th channelhigh-frequency compensation gain estimation unit 211-n does not need toperform the high-pass filter processing for obtaining the n-th channelcompensation signal ^({circumflex over ( )})X′_(n). Of course, thesignal high-frequency compensation device 201 may include a high-passfilter unit which is not illustrated, the high-pass filter unit may passthe n-th channel decoded sound signal ^({circumflex over ( )})X_(n)through the high-pass filter to obtain and output the n-th channelcompensation signal ^({circumflex over ( )})X′_(n), the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) may be input to then-th channel high-frequency compensation gain estimation unit 211-n andthe n-th channel high-frequency compensation unit 221-n, and the n-thchannel high-frequency compensation gain estimation unit 211-n and then-th channel high-frequency compensation unit 221-n may not perform thehigh-pass filter processing for obtaining the n-th channel compensationsignal ^({circumflex over ( )})X′_(n). That is, the signalhigh-frequency compensation device 201 may employ any configuration aslong as the n-th channel high-frequency compensation gain estimationunit 211-n and the n-th channel high-frequency compensation unit 221-ncan use a signal obtained by passing the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) through the high-pass filter as then-th channel compensation signal ^({circumflex over ( )})X′_(n).

Tenth Embodiment

In a case where the monaural encoding unit 520 of the encoding device500 performs encoding at a higher bit rate than the each channel of thestereo encoding unit 530, there are cases where an n-th channel monauraldecoded sound upmixed signal ^({circumflex over ( )})X_(Mn) based on themonaural decoded sound signal ^({circumflex over ( )})X_(M) obtained bythe monaural decoding unit 610 of the decoding device 600 has highersound quality than the n-th channel decoded sound signal^({circumflex over ( )})X_(n) obtained by the stereo decoding unit 620of the decoding device 600 and is suitable as a signal used forcompensation of the high frequency. Accordingly, a sound signalhigh-frequency compensation device of a tenth embodiment uses the n-thchannel monaural decoded sound upmixed signal^({circumflex over ( )})X_(Mn) for the compensation of the highfrequency instead of the n-th channel decoded sound signal^({circumflex over ( )})X_(n) that has been used for the compensation ofthe high frequency by the sound signal high-frequency compensationdevice of the ninth embodiment. Hereinafter, regarding the sound signalhigh-frequency compensation device of the tenth embodiment, differencesfrom the sound signal high-frequency compensation device of the ninthembodiment will be mainly described using an example in a case where thenumber of channels of the stereo is two.

<<Sound Signal High-Frequency Compensation Device 202>>

As illustrated in FIG. 21 , a sound signal high-frequency compensationdevice 202 of the tenth embodiment includes a first channelhigh-frequency compensation gain estimation unit 212-1, a first channelhigh-frequency compensation unit 222-1, a second channel high-frequencycompensation gain estimation unit 212-2, and a second channelhigh-frequency compensation unit 222-2. The first channel purifieddecoded sound signal ^(˜)X₁ and the second channel purified decodedsound signal ^(˜)X₂ output by any of the sound signal purificationdevices described above, the first channel decoded sound signal^({circumflex over ( )})X₁ and the second channel decoded sound signal^({circumflex over ( )})X₂ output by the stereo decoding unit 620 of thedecoding device 600, and the first channel upmixed monaural decodedsound signal ^({circumflex over ( )})X_(M1) and the second channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(M2)output by any of the sound signal purification devices described aboveare input to the sound signal high-frequency compensation device 202.

That is, in a case where the sound signal purification device includesthe monaural decoded sound upmixing unit and obtains the upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) of the eachchannel, the upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) of the each channel obtained by themonaural decoded sound upmixing unit is output by the sound signalpurification device and input to the sound signal high-frequencycompensation device 202. Note that a case where the sound signalpurification device does not include the monaural decoded sound upmixingunit will be described later in a modification example of the tenthembodiment.

The sound signal high-frequency compensation device 202 obtains andoutputs, for the each channel of the stereo in units of frames having apredetermined time length of 20 ms, for example, a compensated decodedsound signal of the channel, which is a sound signal obtained bycompensating the high-frequency energy of the purified decoded soundsignal of the channel, by using the purified decoded sound signal of thechannel, the decoded sound signal of the channel, and the upmixedmonaural decoded sound signal of the channel. Assuming that the channelnumber n (channel index n) of the first channel is 1 and the channelnumber n of the second channel is 2, the sound signal high-frequencycompensation device 202 performs steps S212-n and S222-n illustrated inFIG. 20 for the each channel for the each frame.

[n-Th Channel High-Frequency Compensation Gain Estimation Unit 212-n]

At least the n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signalhigh-frequency compensation device 202 and the n-th channel purifieddecoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . ,^(˜)x_(n)(T)} input to the sound signal high-frequency compensationdevice 202 are input to the n-th channel high-frequency compensationgain estimation unit 212-n. The n-th channel high-frequency compensationgain estimation unit 212-n obtains and outputs the n-th channelhigh-frequency compensation gain ρ_(n) by using at least the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and the n-thchannel purified decoded sound signal ^(˜)X_(n) (step S212-n). The n-thchannel high-frequency compensation gain estimation unit 212-n obtainsthe n-th channel high-frequency compensation gain ρ_(n) by, for example,the first method described in the ninth embodiment or the followingsecond method.

[[Second Method for Obtaining n-Th Channel High-Frequency CompensationGain ρ_(n)]]

The second method is a method of performing a process of obtaining then-th channel compensation signal ^({circumflex over ( )})X′_(n) from then-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(M) instead of the process of obtaining then-th channel compensation signal ^({circumflex over ( )})X′_(n) from then-th channel decoded sound signal ^({circumflex over ( )})X_(n) by thesecond method of the ninth embodiment. Therefore, in the case of usingthe second method, the n-th channel upmixed monaural decoded soundsignal ^({circumflex over ( )})X_(M) input to the sound signalhigh-frequency compensation device 202 is also input to the n-th channelhigh-frequency compensation gain estimation unit 212-n as indicated by abroken line in FIG. 21 . In the second method, the n-th channelhigh-frequency compensation gain estimation unit 212-n obtains the n-thchannel high-frequency compensation gain ρ_(n) by, for example,performing the following step S212-21-n instead of step S211-21-n of thesecond method of the ninth embodiment, and then performing the samesteps S211-22-n and S211-23-n as those in the second method of the ninthembodiment. That is, the n-th channel high-frequency compensation gainestimation unit 212-n first passes the n-th channel upmixed monauraldecoded sound signal ^({circumflex over ( )})X_(M) through a high-passfilter having the same characteristics as those used by the n-th channelhigh-frequency compensation unit 222-n to obtain the n-th channelcompensation signal^({circumflex over ( )})X′_(n)={^({circumflex over ( )})x′_(n)(1),^({circumflex over ( )})x′_(n)(2), . . . ,^({circumflex over ( )})x′_(n)(T)} (step S212-21-n), and then performsstep S211-22-n and step S211-23-n described above in the description ofthe second method of the ninth embodiment.

[n-Th Channel High-Frequency Compensation Unit 222-n]

The n-th channel high-frequency compensation unit 222-n obtains the n-thchannel compensated decoded sound signal ^(˜)X′_(n) by using the n-thchannel upmixed monaural decoded sound signal^({circumflex over ( )})X_(M) instead of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) that has been used by the n-thchannel high-frequency compensation unit 221-n of the ninth embodiment.The n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} input to the signal high-frequencycompensation device 202, the n-th channel purified decoded sound signal^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} input tothe sound signal high-frequency compensation device 202, and the n-thchannel high-frequency compensation gain ρ_(n) output by the n-thchannel high-frequency compensation gain estimation unit 212-n are inputto the n-th channel high-frequency compensation unit 222-n. The n-thchannel high-frequency compensation unit 222-n obtains and outputs asignal obtained by adding the n-th channel purified decoded sound signal^(˜)X_(n) and a signal obtained by multiplying a high-frequencycomponent of the n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(M) by the n-th channel high-frequencycompensation gain ρ_(n), as the n-th channel compensated decoded soundsignal ^(˜)X′_(n)={^(˜)x′_(n)(1), ^(˜)x_(n)′(2), . . . , ^(˜)x′_(n)(T)}(step S222-n).

For example, the n-th channel high-frequency compensation unit 222-npasses the n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) through a high-pass filter to obtain ann-th channel compensation signal^({circumflex over ( )})X′_(n)={^({circumflex over ( )})x′_(n)(1),^({circumflex over ( )})x′_(n)(2), . . . ,^({circumflex over ( )})x′_(n)(T)} and, for each corresponding sample t,obtains and outputs a sequence based on a value ^(˜)x′_(n)(t) obtainedby adding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t) obtained bymultiplying the n-th channel high-frequency compensation gain ρ_(n) bythe sample value ^({circumflex over ( )})x′_(n)(t) of the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) as the n-th channelcompensated decoded sound signal ^(˜)X′_(n)={^(˜)x′_(n)(1),^(˜)x′_(n)(2), . . . , ^(˜)x′_(n)(T)}. That is,^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})X′_(n)(t)

Note that, as in the ninth embodiment, in a case where the n-th channelhigh-frequency compensation gain estimation unit 212-n uses the methodexemplified in the [[Second Method for Obtaining n-th ChannelHigh-Frequency Compensation Gain ρ_(n)]], one of the n-th channelhigh-frequency compensation gain estimation unit 212-n and the n-thchannel high-frequency compensation unit 222-n may pass the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn)through the high-pass filter to obtain and output the n-th channelcompensation signal ^({circumflex over ( )})X′_(n), and the other mayuse the n-th channel compensation signal ^({circumflex over ( )})X′_(n)obtained by the other without performing the high-pass filter processingfor obtaining the n-th channel compensation signal^({circumflex over ( )})X′_(n). In addition, the signal high-frequencycompensation device 202 may include a high-pass filter unit, which isnot illustrated, the high-pass filter unit may pass the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(M)through the high-pass filter to obtain and output the n-th channelcompensation signal ^({circumflex over ( )})X′_(n), and the n-th channelhigh-frequency compensation gain estimation unit 212-n and the n-thchannel high-frequency compensation unit 222-n may use the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) obtained by thehigh-pass filter unit without performing the high-pass filter processingfor obtaining the n-th channel compensation signal^({circumflex over ( )})X′_(n). That is, the signal high-frequencycompensation device 202 may employ any configuration as long as the n-thchannel high-frequency compensation gain estimation unit 212-n and then-th channel high-frequency compensation unit 222-n can use a signalobtained by passing the n-th channel upmixed monaural decoded soundsignal ^({circumflex over ( )})X_(M) through the high-pass filter as then-th channel compensation signal ^({circumflex over ( )})X′_(n).

Modification Example of Tenth Embodiment

In the tenth embodiment, the case where the sound signal purificationdevice includes the monaural decoded sound upmixing unit and obtains theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel has been described, but in a case where the soundsignal purification device does not include the monaural decoded soundupmixing unit and does not obtain the upmixed monaural decoded soundsignal ^({circumflex over ( )})X_(Mn) of the each channel, the soundsignal purification device 202 is only required to use the monauraldecoded sound signal ^({circumflex over ( )})X_(M) output by themonaural decoding unit 610 of the decoding device 600 instead of theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel that has been used in the tenth embodiment. Inaddition, even in a case where the sound signal purification deviceincludes the monaural decoded sound upmixing unit and obtains theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel, the sound signal purification device 202 may use themonaural decoded sound signal ^({circumflex over ( )})X_(M) output bythe monaural decoding unit 610 of the decoding device 600 instead of theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel that has been used in the tenth embodiment.

Eleventh Embodiment

Which one of the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the n-th channel upmixed monauraldecoded sound signal ^({circumflex over ( )})X_(Mn) is used for thecompensation of the high frequency may be selected according to the bitrate. Using this mode as an eleventh embodiment, differences from thesound signal high-frequency compensation device of the ninth embodimentand the sound signal high-frequency compensation device of the tenthembodiment will be mainly described using an example in a case where thenumber of channels of the stereo is two.

<<Sound Signal High-Frequency Compensation Device 203>>

As illustrated in FIG. 22 , the sound signal high-frequency compensationdevice 203 of the eleventh embodiment includes a first channel signalselection unit 233-1, a first channel high-frequency compensation gainestimation unit 213-1, a first channel high-frequency compensation unit223-1, a second channel signal selection unit 233-2, a second channelhigh-frequency compensation gain estimation unit 213-2, and a secondchannel high-frequency compensation unit 223-2. The first channelpurified decoded sound signal ^(˜)X₁ and the second channel purifieddecoded sound signal ^(˜)X₂ output by any one of the sound signalpurification devices described above, the first channel decoded soundsignal ^({circumflex over ( )})X₁ and the second channel decoded soundsignal ^({circumflex over ( )})X₂ output by the stereo decoding unit 620of the decoding device 600, the first channel upmixed monaural decodedsound signal ^({circumflex over ( )})X_(M1) and the second channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(M2)output by any one of the sound signal purification devices describedabove, and bit rate information are input to the sound signalhigh-frequency compensation device 203.

The bit rate information is information corresponding to the bit ratesof the monaural encoding unit 520 and the monaural decoding unit 610 forthe each frame and information corresponding to the bit rates perchannel of the stereo encoding unit 530 and the stereo decoding unit620. The information corresponding to the bit rates of the monauralencoding unit 520 and the monaural decoding unit 610 for the each frameis, for example, the number of bits b_(M) of the monaural code CM of theeach frame. The information corresponding to the bit rates of the stereoencoding unit 530 and the stereo decoding unit 620 for the each frameis, for example, the number of bits b_(n) of the each channel in thenumber of bits b_(s) of the stereo code CS of the each frame. Note that,in a case where the number of bits b_(M) and the number of bits b_(n)are the same in all the frames, it is not necessary to input the bitrate information to the sound signal high-frequency compensation device203, and it is only required that the bit rate information is stored inadvance in the storage unit, which is not illustrated, in the firstchannel signal selection unit 233-1 and the storage unit, which is notillustrated, in the second channel signal selection unit 233-2.

The sound signal high-frequency compensation device 203 obtains andoutputs, for the each channel of the stereo in units of frames having apredetermined time length of 20 ms, for example, a compensated decodedsound signal of the channel, which is a sound signal obtained bycompensating the high-frequency energy of the purified decoded soundsignal of the channel, by using the purified decoded sound signal of thechannel, the decoded sound signal of the channel, the upmixed monauraldecoded sound signal of the channel, and the bit rate information.Assuming that the channel number n (channel index n) of the firstchannel is 1 and the channel number n of the second channel is 2, thesound signal high-frequency compensation device 203 performs stepsS233-n, S213-n, and S223-n illustrated in FIG. 23 for the each channelfor the each frame.

[n-Th Channel Signal Selection Unit 233-n]

To the n-th channel signal selection unit 233-n, the n-th channeldecoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), ^({circumflex over ( )})x_(n)(T)}input to the sound signal high-frequency compensation device 203, then-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(M)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} input to the sound signalhigh-frequency compensation device 203, and the bit rate informationinput to the sound signal high-frequency compensation device 203 areinput. However, in a case where the bit rate information is stored inadvance in the storage unit, which is not illustrated, in the n-thchannel signal selection unit 233-n, the bit rate information may not beinput. In a case where the bit rates per channel of the stereo encodingunit 530 and the stereo decoding unit 620 are higher than the bit ratesof the monaural encoding unit 520 and the monaural decoding unit 610,that is, in a case where b_(n) is larger than b_(M), the n-th channelsignal selection unit 233-n selects the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} and outputs the selected signal as then-th channel selection signal^({circumflex over ( )})X_(Sn)={^({circumflex over ( )})x_(Sn)(1)^({circumflex over ( )})x_(Sn)(2), . . . ,^({circumflex over ( )})x_(Sn)(T)}, and in a case where the bit ratesper channel of the stereo encoding unit 530 and the stereo decoding unit620 are lower than the bit rates of the monaural encoding unit 520 andthe monaural decoding unit 610, that is, in a case where b_(n) issmaller than b_(M), the n-th channel signal selection unit 233-n selectsthe n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(Mn)(T)} and outputs the selected signal asthe n-th channel selection signal^({circumflex over ( )})X_(Sn)={^({circumflex over ( )})x_(Sn)(1),^({circumflex over ( )})x_(Sn)(2), . . . ,^({circumflex over ( )})x_(Sn)(T)} (step S233-n). In a case where thebit rates of the monaural encoding unit 520 and the monaural decodingunit 610 and the bit rates per channel of the stereo encoding unit 530and the stereo decoding unit 620 are equal, that is, in a case whereb_(M) and b_(n) have the same value, the n-th channel signal selectionunit 233-n may select either the n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} or the n-th channel upmixed monauraldecoded sound signal^({circumflex over ( )})X_(Mn)={^({circumflex over ( )})x_(Mn)(1),^({circumflex over ( )})x_(Mn)(2), . . . ,^({circumflex over ( )})x_(n)(T)} and output the selected signal as then-th channel selection signal ^({circumflex over ( )})X_(Sn)={x_(Sn)(1), ^({circumflex over ( )})x_(Sn) (2), . . . ,^({circumflex over ( )})x_(Sn) (T)}.

[n-th Channel High-Frequency Compensation Gain Estimation Unit 213-n]

At least the n-th channel decoded sound signal^({circumflex over ( )})X_(n)={^({circumflex over ( )})x_(n)(1),^({circumflex over ( )})x_(n)(2), . . . ,^({circumflex over ( )})x_(n)(T)} input to the sound signalhigh-frequency compensation device 203 and the n-th channel purifieddecoded sound signal ^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . ,^(˜)x_(n)(T)} input to the sound signal high-frequency compensationdevice 203 are input to the n-th channel high-frequency compensationgain estimation unit 213-n. The n-th channel high-frequency compensationgain estimation unit 213-n obtains and outputs the n-th channelhigh-frequency compensation gain ρ_(n) by using at least the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and the n-thchannel purified decoded sound signal ^(˜)X_(n) (step S213-n). The n-thchannel high-frequency compensation gain estimation unit 213-n obtainsthe n-th channel high-frequency compensation gain ρ_(n) by, for example,the first method described in the ninth embodiment or the followingsecond method.

[[Second Method for Obtaining n-Th Channel High-Frequency CompensationGain ρ_(n)]]

In the case of using the second method, as indicated by a broken line inFIG. 22 , the n-th channel selection signal^({circumflex over ( )})X_(Sn)={^({circumflex over ( )})x_(Sn)(1),^({circumflex over ( )})x_(Sn) (2), . . . ,^({circumflex over ( )})x_(Sn)(T)} obtained by the n-th channel signalselection unit 233-n is also input to the n-th channel high-frequencycompensation gain estimation unit 213-n. In the second method, the n-thchannel high-frequency compensation gain estimation unit 213-n obtainsthe n-th channel high-frequency compensation gain ρ_(n) by, for example,performing the following step S213-21-n instead of step S211-21-n of thesecond method of the ninth embodiment, and then performing the samesteps S211-22-n and S211-23-n as those in the second method of the ninthembodiment. That is, the n-th channel high-frequency compensation gainestimation unit 213-n first passes the n-th channel selection signal^({circumflex over ( )})X_(Sn)={^({circumflex over ( )})x_(Sn)(1),^({circumflex over ( )})x_(Sn)(2), . . . ,^({circumflex over ( )})x_(Sn)(T)} through a high-pass filter having thesame characteristics as those used by the n-th channel high-frequencycompensation unit 223-n to obtain the n-th channel compensation signal^({circumflex over ( )})X_(n)′={^({circumflex over ( )})x′_(n)(1),^({circumflex over ( )})x′_(n)(2), . . . ,^({circumflex over ( )})x′_(n)(T)} (step S213-21-n), and then performsstep S211-22-n and step S211-23-n described above in the description ofthe second method of the ninth embodiment.

[n-Th Channel High-Frequency Compensation Unit 223-n]

The n-th channel high-frequency compensation unit 223-n obtains the n-thchannel compensated decoded sound signal ^(˜)X′_(n) using the n-thchannel selection signal ^({circumflex over ( )})X_(Sn). The n-thchannel selection signal^({circumflex over ( )})X_(Sn)={^({circumflex over ( )})x_(Sn)(1),^({circumflex over ( )})x_(Sn)(2), . . . ,^({circumflex over ( )})x_(Sn)(T)} obtained by the n-th channel signalselection unit 233-n, the n-th channel purified decoded sound signal^(˜)X_(n)={^(˜)x_(n)(1), ^(˜)x_(n)(2), . . . , ^(˜)x_(n)(T)} input tothe sound signal high-frequency compensation device 203, and the n-thchannel high-frequency compensation gain ρ_(n) output by the n-thchannel high-frequency compensation gain estimation unit 213-n are inputto the n-th channel high-frequency compensation unit 223-n. The n-thchannel high-frequency compensation unit 223-n obtains and outputs asignal obtained by adding the n-th channel purified decoded sound signal^(˜)X_(n) and a signal obtained by multiplying the high-frequencycomponent of the n-th channel selection signal^({circumflex over ( )})X_(Sn) by the n-th channel high-frequencycompensation gain ρ_(n), as the n-th channel compensated decoded soundsignal ^(˜)X′_(n)={^(˜)x′_(n)(1), ^(˜)x_(n)′(2), . . . , ^(˜)x′_(n)(T)}(step S223-n).

For example, the n-th channel high-frequency compensation unit 223-npasses the n-th channel selection signal ^({circumflex over ( )})X_(Sn)through a high-pass filter to obtain an n-th channel compensation signal^({circumflex over ( )})X′_(n)={^({circumflex over ( )})x′_(n)(1)^({circumflex over ( )})x′_(n)(2), . . . ,^({circumflex over ( )})x′_(n)(T)} and, for each corresponding sample t,obtains and outputs a sequence based on a value ^(˜)x′_(n)(t) obtainedby adding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t) obtained bymultiplying the n-th channel high-frequency compensation gain ρ_(n) bythe sample value ^({circumflex over ( )})x′_(n)(t) of the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) as the n-th channelcompensated decoded sound signal ^(˜)X′_(n)={^(˜)x′_(n)(1),^(˜)x′_(n)(2), . . . , ^(˜)X′_(n)(T)}. That is,^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})X′_(n)(t).

Note that, as in the ninth embodiment and the tenth embodiment, in acase where the n-th channel high-frequency compensation gain estimationunit 213-n uses the method exemplified in the [[Second Method forObtaining n-th Channel High-frequency Compensation Gain ρ_(n)]], one ofthe n-th channel high-frequency compensation gain estimation unit 213-nand the n-th channel high-frequency compensation unit 223-n may pass then-th channel selection signal ^({circumflex over ( )})X_(Sn) through thehigh-pass filter to obtain and output the n-th channel compensationsignal ^({circumflex over ( )})X′_(n), and the other may use the n-thchannel compensation signal ^({circumflex over ( )})X′_(n) obtained bythe other without performing the high-pass filter processing forobtaining the n-th channel compensation signal^({circumflex over ( )})X′_(n). In addition, the signal high-frequencycompensation device 203 may include a high-pass filter unit, which isnot illustrated, the high-pass filter unit may pass the n-th channelselection signal ^({circumflex over ( )})X_(Sn) through the high-passfilter to obtain and output the n-th channel compensation signal^({circumflex over ( )})X′_(n), and the n-th channel high-frequencycompensation gain estimation unit 213-n and the n-th channelhigh-frequency compensation unit 223-n may use the n-th channelcompensation signal ^({circumflex over ( )})X′_(n) obtained by thehigh-pass filter unit without performing the high-pass filter processingfor obtaining the n-th channel compensation signal^({circumflex over ( )})X′_(n). That is, the signal high-frequencycompensation device 203 may employ any configuration as long as the n-thchannel high-frequency compensation gain estimation unit 213-n and then-th channel high-frequency compensation unit 223-n can use a signalobtained by passing the n-th channel selection signal^({circumflex over ( )})X_(Sn) through the high-pass filter as the n-thchannel compensation signal ^({circumflex over ( )})X′_(n).

Modification Example of Eleventh Embodiment

In the eleventh embodiment, the case where the sound signal purificationdevice includes the monaural decoded sound upmixing unit and obtains theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel has been described, but in a case where the soundsignal purification device does not include the monaural decoded soundupmixing unit and does not obtain the upmixed monaural decoded soundsignal ^({circumflex over ( )})X_(Mn) of the each channel, the soundsignal purification device 203 is only required to use the monauraldecoded sound signal ^({circumflex over ( )})X_(M) output by themonaural decoding unit 610 of the decoding device 600 instead of theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel that has been used in the eleventh embodiment. Inaddition, even in the case where the sound signal purification deviceincludes the monaural decoded sound upmixing unit and obtains theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) ofthe each channel, the sound signal purification device 203 may use themonaural decoded sound signal ^({circumflex over ( )})X_(M) output bythe monaural decoding unit 610 of the decoding device 600 instead of theupmixed monaural decoded sound signal ^({circumflex over ( )})X_(M), ofthe each channel that has been used in the eleventh embodiment.

Twelfth Embodiment

Various modes based on the above-described embodiments and modificationexamples will be described as a twelfth embodiment.

[Number of Channels]

In each of the above-described embodiments and modification examples,the description has been given with an example of handling two channelsin order to simplify the description. However, the number of channels isnot limited to this, and is only required to be 2 or more. Assuming thatthe number of channels is N (N is an integer of 2 or more), theabove-described embodiments and modification examples can be implementedby replacing two as the number of channels with N. Specifically, in eachof the above-described embodiments and modification examples, eachunit/step to which “−n” is attached includes N units/steps correspondingto the each channel from 1 to N, and each unit/step to which a notationof a suffix or the like with “n” is attached includes N units/stepscorresponding to each channel number from 1 to N, and thus a soundsignal purification device with the number N of channels or a soundsignal high-frequency compensation device with the number N of channelscan be provided. However, a portion including the processing exemplifiedusing the inter-channel time difference τ and the inter-channelcorrelation coefficient γ in each embodiment and modification example ofthe sound signal purification device described above may be limited totwo channels.

[Sound Signal Post-Processing Device]

The sound signal purification device of any one of the first to eighthembodiments and the respective modification examples is a device thatprocesses a sound signal obtained by decoding, and thus can be said tobe a sound signal post-processing device. That is, as illustrated inFIG. 24 , any of the sound signal purification devices 1101, 1102, 1103,1201, 1202, 1203, 1301, and 1302 of the first to eighth embodiments andthe respective modification examples can be said to be a sound signalpost-processing device 301 (see also FIG. 25 ). Further, as illustratedin FIG. 24 , a device including any one of the sound signal purificationdevices 1101, 1102, 1103, 1201, 1202, 1203, 1301, and 1302 of the firstto eighth embodiments and the respective modification examples as asound signal purification unit can be said to be the sound signalpost-processing device 301.

Similarly, a device obtained by combining the sound signal purificationdevice of any one of the first to eighth embodiments and the respectivemodification examples and the sound signal high-frequency compensationdevice of any one of the ninth to eleventh embodiments and therespective modification examples is also a device that processes a soundsignal obtained by decoding, and thus can be said to be a sound signalpost-processing device. That is, as illustrated in FIG. 26 , a deviceobtained by combining any one of the sound signal purification devices1101, 1102, 1103, 1201, 1202, 1203, 1301, and 1302 of the first toeighth embodiments and the respective modification examples and any oneof the sound signal high-frequency compensation devices 201, 202, and203 of the ninth to eleventh embodiments and the respective modificationexamples can be said to be a sound signal post-processing device 302(see also FIG. 27 ). In addition, as illustrated in FIG. 26 , a deviceincluding any one of the sound signal purification devices 1101, 1102,1103, 1201, 1202, 1203, 1301, and 1302 of the first to eighthembodiments and the respective modification examples as a sound signalpurification unit and including any one of the sound signalhigh-frequency compensation devices 201, 202, and 203 of the ninth toeleventh embodiments and the respective modification examples as a soundsignal high-frequency compensation unit can be said to be the soundsignal post-processing device 302.

[Sound Signal Decoding Device]

The sound signal purification device of any one of the first to eighthembodiments and the respective modification examples can be included inthe sound signal decoding device together with the monaural decodingunit 610 and the stereo decoding unit 620. That is, as illustrated inFIG. 28 , a sound signal decoding device 601 may be configured toinclude the monaural decoding unit 610, the stereo decoding unit 620,and any one of the sound signal purification devices 1101, 1102, 1103,1201, 1202, 1203, 1301, and 1302 of the first to eighth embodiments andthe respective modification examples (see also FIG. 29 ). In addition,as illustrated in FIG. 28 , in addition to the monaural decoding unit610 and the stereo decoding unit 620, the sound signal decoding device601 may be configured to include any one of the sound signalpurification devices 1101, 1102, 1103, 1201, 1202, 1203, 1301, and 1302of the first to eighth embodiments and the respective modificationexamples as a sound signal purification unit.

Similarly, a combination of the sound signal purification device of anyone of the first to eighth embodiments and the respective modificationexamples and the sound signal high-frequency compensation device of anyone of the ninth to eleventh embodiments and the respective modificationexamples can be included in the sound signal decoding device togetherwith the monaural decoding unit 610 and the stereo decoding unit 620.That is, as illustrated in FIG. 30 , the sound signal decoding device602 may be configured to include the monaural decoding unit 610, thestereo decoding unit 620, any one of the sound signal purificationdevices 1101, 1102, 1103, 1201, 1202, 1203, 1301, and 1302 of the firstto eighth embodiments and the respective modification examples, and anyone of the sound signal high-frequency compensation devices 201, 202,and 203 of the ninth to eleventh embodiments and the respectivemodification examples (see also FIG. 31 ). In addition, as illustratedin FIG. 30 , in addition to the monaural decoding unit 610 and thestereo decoding unit 620, the sound signal decoding device 602 may beconfigured to include any one of the sound signal purification devices1101, 1102, 1103, 1201, 1202, 1203, 1301, and 1302 of the first toeighth embodiments and the respective modification examples as a soundsignal purification unit, and include any one of the sound signalhigh-frequency compensation devices 201, 202, and 203 of the ninth toeleventh embodiments and the respective modification examples as a soundsignal high-frequency compensation unit.

[Program and Recording Medium]

The processing of each unit of each device described above may beimplemented by a computer, in which case, processing content of afunction that each device should have is described by a program. Then,by causing a storage unit 5020 of a computer 5000 illustrated in FIG. 33to read this program and causing an arithmetic processing unit 5010, aninput unit 5030, an output unit 5040, and the like to operate, variousprocessing functions in the above devices are implemented on thecomputer.

The program describing the processing content can be recorded in acomputer-readable recording medium. The computer-readable recordingmedium is, for example, a non-transitory recording medium and isspecifically a magnetic recording device, an optical disk, or the like.

Further, distribution of the program is carried out by, for example,selling, transferring, renting, or the like of a portable recordingmedium such as a DVD or a CD-ROM in which the program is recorded.Furthermore, the program may be stored in a storage device of a servercomputer, and the program may be distributed by transferring the programfrom the server computer to another computer via a network.

For example, the computer that executes such a program, first,temporarily stores the program recorded in a portable recording mediumor the program transferred from a server computer in an auxiliaryrecording unit 5050 that is a non-transitory storage device of thecomputer. Then, at the time of executing the processing, the computerreads the program stored in the auxiliary recording unit 5050, which isthe non-temporary storage device of the computer, into the storage unit5020 and executes the processing in accordance with the read program. Inaddition, as another embodiment of the program, the computer maydirectly read the program from the portable recording medium into thestorage unit 5020 and execute processing in accordance with the program,and furthermore, the computer may sequentially execute processing inaccordance with the received program each time the program istransferred from the server computer to the computer. Furthermore, theabove-described processing may be executed by a so-called applicationservice provider (ASP) type service that implements a processingfunction only by an execution instruction and result acquisition withouttransferring the program from the server computer to the computer. Notethat the program in the present embodiment includes information used forprocessing by an electronic computer and equivalent to the program (dataor the like that is not direct command to computer but has property thatdefines processing of the computer).

Furthermore, while the present device is configured by executing apredetermined program on a computer in this embodiment, at least some ofthe processing contents may be implemented by hardware.

In addition, it goes without saying that modifications can beappropriately made without departing from the gist of the presentinvention. Further, the processing described in the above embodiment maybe executed not only in chronological order according to the describedorder, but also in parallel or individually according to the processingcapability of the device that executes the processing or as necessary.Furthermore, the processing described in the above embodiment may beexecuted not only in chronological order according to the order ofdescription, but also in chronological order in the order opposite tothe order of description in a case where the order of execution may beswitched.

1. A sound signal high-frequency compensation method for obtaining, foreach frame, an n-th channel compensated decoded sound signal ^(˜)X′_(n)that is a signal obtained by compensating a high frequency of an n-thchannel purified decoded sound signal ^(˜)X_(n) obtained by performingsignal processing in a time domain on an n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) (n is each integer of 1 or more andN or less) that is a decoded sound signal of each channel of stereoobtained by decoding a stereo code CS, the sound signal high-frequencycompensation method comprising: an n-th channel high-frequencycompensation gain estimation step of obtaining, for the each frame withrespect to the each channel, an n-th channel high-frequency compensationgain ρ_(n) that is a value for bringing high-frequency energy of then-th channel compensated decoded sound signal ^(˜)X′_(n) a close tohigh-frequency energy of the n-th channel decoded sound signal^({circumflex over ( )})X_(n); and an n-th channel high-frequencycompensation step of obtaining and outputting, for the each frame withrespect to the each channel, a signal obtained by adding the n-thchannel purified decoded sound signal ^(˜)X_(n) and a signal obtained bymultiplying a high-frequency component of the n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) that is asignal obtained by upmixing, for the each channel, a monaural decodedsound signal ^({circumflex over ( )})X_(M) that is obtained by decodinga monaural code CM that is a code different from the stereo code CS bythe n-th channel high-frequency compensation gain ρ_(n), as the n-thchannel compensated decoded sound signal ^(˜)X′_(n), wherein a signalobtained by passing the n-th channel upmixed monaural decoded soundsignal ^({circumflex over ( )})X_(Mn) through a high-pass filter is usedas an n-th channel compensation signal ^({circumflex over ( )})X′_(n),the n-th channel high-frequency compensation step obtains, for eachcorresponding sample t, a sequence based on a value^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t)obtained by adding a sample value ^(˜)x_(n)(t) of the n-th channelpurified decoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t)obtained by multiplying the n-th channel high-frequency compensationgain ρ_(n) by a sample value ^({circumflex over ( )})x′_(n)(t) of then-th channel compensation signal ^({circumflex over ( )})X′_(n), as then-th channel compensated decoded sound signal ^(˜)X′_(n), and the n-thchannel high-frequency compensation gain estimation step obtains, foreach corresponding sample t, a sequence based on a value^(˜)x″_(n)(t)=^(˜)x_(n)(t)+^({circumflex over ( )})x′_(n)(t) obtained byadding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and the sample value^({circumflex over ( )})x′_(n)(t) of the n-th channel compensationsignal ^({circumflex over ( )})X′_(n), as an n-th channel temporaryaddition signal ^(˜)X″_(n), and obtains the n-th channel high-frequencycompensation gain ρ_(n) that is a value larger as high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)is smaller than high-frequency energy ^({circumflex over ( )})EX_(n) ofthe n-th channel decoded sound signal ^({circumflex over ( )})X_(n), andis a value larger as a difference between the high-frequency energy ofthe n-th channel purified decoded sound signal ^(˜)X_(n) andhigh-frequency energy of the n-th channel temporary addition signal^(˜)X″_(n) is smaller than the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n).
 2. A sound signal high-frequencycompensation method for obtaining, for each frame, an n-th channelcompensated decoded sound signal ^(˜)X′_(n) that is a signal obtained bycompensating a high frequency of an n-th channel purified decoded soundsignal ^(˜)X_(n) obtained by performing signal processing in a timedomain on an n-th channel decoded sound signal^({circumflex over ( )})X_(n) (n is each integer of 1 or more and N orless) that is a decoded sound signal of each channel of stereo obtainedby decoding a stereo code CS, the sound signal high-frequencycompensation method comprising: an n-th channel high-frequencycompensation gain estimation step of obtaining, for the each frame withrespect to the each channel, an n-th channel high-frequency compensationgain ρ_(n) that is a value for bringing high-frequency energy of then-th channel compensated decoded sound signal ^(˜)X_(n) close tohigh-frequency energy of the n-th channel decoded sound signal^({circumflex over ( )})X_(n); and an n-th channel high-frequencycompensation step of obtaining and outputting, for the each frame withrespect to the each channel, a signal obtained by adding the n-thchannel purified decoded sound signal ^(˜)X_(n) and a signal obtained bymultiplying a high-frequency component of a monaural decoded soundsignal ^({circumflex over ( )})X_(M) that is obtained by decoding amonaural code CM that is a code different from the stereo code CS by then-th channel high-frequency compensation gain ρ_(n), as the n-th channelcompensated decoded sound signal ^(˜)X′_(n), wherein a signal obtainedby passing the monaural decoded sound signal^({circumflex over ( )})X_(M) through a high-pass filter is used as ann-th channel compensation signal ^({circumflex over ( )})X′_(n), then-th channel high-frequency compensation step obtains, for eachcorresponding sample t, a sequence based on a value^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t)obtained by adding a sample value ^(˜)x_(n)(t) of the n-th channelpurified decoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t)obtained by multiplying the n-th channel high-frequency compensationgain ρ_(n) by a sample value ^({circumflex over ( )})x′_(n)(t) of then-th channel compensation signal ^({circumflex over ( )})X′_(n), as then-th channel compensated decoded sound signal ^(˜)X′_(n), and the n-thchannel high-frequency compensation gain estimation step obtains, foreach corresponding sample t, a sequence based on a value^(˜)x″_(n)(t)=^(˜)x_(n)(t)+^({circumflex over ( )})x′_(n)(t) obtained byadding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and the sample value^({circumflex over ( )})x′_(n)(t) of the n-th channel compensationsignal ^({circumflex over ( )})X′_(n), as an n-th channel temporaryaddition signal ^(˜)X″_(n), and obtains the n-th channel high-frequencycompensation gain ρ_(n) that is a value larger as high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)is smaller than high-frequency energy ^({circumflex over ( )})EX_(n) ofthe n-th channel decoded sound signal ^({circumflex over ( )})X_(n), andis a value larger as a difference between the high-frequency energy ofthe n-th channel purified decoded sound signal ^(˜)X_(n) andhigh-frequency energy of the n-th channel temporary addition signal^(˜)X″_(n) is smaller than the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n).
 3. The sound signal high-frequencycompensation method according to claim 1, wherein the n-th channelhigh-frequency compensation gain estimation step obtains the n-thchannel high-frequency compensation gain ρ_(n) by[Math. 52]ρ_(n)=√{square root over ({circumflex over (ρ)}_(n) ²+0.25μ_(n)²)}+0.5μ_(n)  (92)or[Math. 53]ρ_(n)=√{square root over ({circumflex over (ρ)}_(n) ²)}+μ_(n)  (93)or[Math. 54]ρ_(n)=√{square root over ({circumflex over (ρ)}_(n) ²)}+Aμ _(n)  (94)that use [ Math . 50 ]  ρ n 2 = 1 - n n ( 92 ⁢ a ) [ Math . 51 ]  μ n =1 - n ″ - n n ( 92 ⁢ b ) where A is a predetermined positive value.
 4. Asound signal post-processing method comprising the sound signalhigh-frequency compensation method according to claim 2 as a soundsignal high-frequency compensation step, the sound signalpost-processing method further comprising a sound signal purificationstep of performing signal processing in the time domain, wherein thesound signal purification step obtains, for the each frame, the n-thchannel purified decoded sound signal ^(˜)X_(n) that is a sound signalof the each channel of the stereo by using at least the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) and the monauraldecoded sound signal ^({circumflex over ( )})X_(M), the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) is obtained bydecoding the stereo code CS without using either information obtained bydecoding the monaural code CM or the monaural code CM, and the soundsignal post-processing method further comprises an n-th channel signalpurification step of obtaining, for the each frame and for eachcorresponding sample t with respect to the each channel n, a sequencebased on a value^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(n)×^({circumflex over ( )})x_(M)(t)obtained by multiplying an n-th channel purification weight α_(n) by asample value ^({circumflex over ( )})x_(M)(t) of the monaural decodedsound signal ^({circumflex over ( )})X_(M) and a value(1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained by multiplying avalue (1−α_(n)) obtained by subtracting the n-th channel purificationweight α_(n) from 1 by a sample value ^({circumflex over ( )})x_(n)(t)of the n-th channel decoded sound signal ^({circumflex over ( )})X_(n),as the n-th channel purified decoded sound signal ^(˜)X_(n).
 5. A soundsignal post-processing method comprising the sound signal high-frequencycompensation method according to claim 1 as a sound signalhigh-frequency compensation step, the sound signal post-processingmethod further comprising a sound signal purification step of performingsignal processing in the time domain, wherein the sound signalpurification step obtains, for the each frame, the n-th channel purifieddecoded sound signal ^(˜)X_(n) that is a sound signal of the eachchannel of the stereo by using at least the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) and the monaural decoded soundsignal ^({circumflex over ( )})X_(M), the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) is obtained by decoding the stereocode CS without using either information obtained by decoding themonaural code CM or the monaural code CM, and the sound signalpost-processing method further comprises a monaural decoded soundupmixing step of obtaining, for the each frame, an n-th channel upmixedmonaural decoded sound signal ^({circumflex over ( )})X_(Mn) that is asignal obtained by upmixing the monaural decoded sound signal^({circumflex over ( )})X_(M) for the each channel by an upmixingprocess using the monaural decoded sound signal^({circumflex over ( )})X_(M) and inter-channel relationship informationthat is information indicating a relationship between the channels ofthe stereo, and an n-th channel signal purification step of obtaining,for the each frame and for each corresponding sample t with respect tothe each channel n, a sequence based on a value^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(Mn)(t)obtained by adding a value α_(n)×^({circumflex over ( )})x_(Mn)(t)obtained by multiplying an n-th channel purification weight α_(n) by asample value ^({circumflex over ( )})x_(Mn)(t) of the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) anda value (1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained bymultiplying a value (1−α_(n)) obtained by subtracting the n-th channelpurification weight α_(n) from 1 by a sample value^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), as the n-th channel purifieddecoded sound signal ^(˜)X_(n).
 6. A sound signal post-processing methodcomprising the sound signal high-frequency compensation method accordingto claim 2 as a sound signal high-frequency compensation step, the soundsignal post-processing method further comprising a sound signalpurification step of performing signal processing in the time domain,wherein the sound signal purification step obtains, for the each frame,the n-th channel purified decoded sound signal ^(˜)X_(n) that is a soundsignal of the each channel of the stereo by using at least the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and themonaural decoded sound signal ^({circumflex over ( )})X_(M), the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) is obtainedby decoding the stereo code CS without using either information obtainedby decoding the monaural code CM or the monaural code CM, and the soundsignal post-processing method further comprises a decoded sound commonsignal estimation step of obtaining, for the each frame, a decoded soundcommon signal ^({circumflex over ( )})Y_(M) that is a signal common toall channels of the stereo by using at least all of one or more and N orless n-th channel decoded sound signals ^({circumflex over ( )})X_(n), acommon signal purification step of obtaining, for the each frame and foreach corresponding sample t, a sequence based on a value^(˜)y_(M)(t)−(1−α_(M))×^({circumflex over ( )})y_(M)(t)+α_(M)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(M)×^({circumflex over ( )})x_(M)(t)obtained by multiplying a common signal purification weight α_(M) by asample value ^({circumflex over ( )})x_(M)(t) of the monaural decodedsound signal ^({circumflex over ( )})X_(M) and a value(1−α_(M))×^({circumflex over ( )})y_(M)(t) obtained by multiplying avalue (1−α_(M)) obtained by subtracting the common signal purificationweight α_(M) from 1 by a sample value ^({circumflex over ( )})y_(M)(t)of the decoded sound common signal ^({circumflex over ( )})Y_(M), as apurified common signal ^(˜)Y_(M), an n-th channel separation combinationweight estimation step of obtaining, for the each frame with respect tothe each channel n, a normalized inner product value for the decodedsound common signal ^({circumflex over ( )})Y_(M) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) as an n-th channelseparation combination weight β_(n), and an n-th channel separationcombination step of obtaining, for the each frame and for eachcorresponding sample t with respect to the each channel n, a sequencebased on a value^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(M)(t)+β_(n)×^(˜)y_(Mn)(t)obtained by subtracting a value β_(n)×^({circumflex over ( )})y_(M)(t)obtained by multiplying the n-th channel separation combination weightβ_(n) by the sample value ^({circumflex over ( )})y_(M)(t) of thedecoded sound common signal ^({circumflex over ( )})Y_(M) from a samplevalue ^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(M)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(M)(t) of the purifiedcommon signal ^(˜)Y_(M), as the n-th channel purified decoded soundsignal ^(˜)X_(n).
 7. A sound signal post-processing method comprisingthe sound signal high-frequency compensation method according to claim 2as a sound signal high-frequency compensation step, the sound signalpost-processing method further comprising a sound signal purificationstep of performing signal processing in the time domain, wherein thesound signal purification step obtains, for the each frame, the n-thchannel purified decoded sound signal ^(˜)X_(n) that is a sound signalof the each channel of the stereo by using at least the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) and the monauraldecoded sound signal ^({circumflex over ( )})X_(M), the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) is obtained bydecoding the stereo code CS without using either information obtained bydecoding the monaural code CM or the monaural code CM, and the soundsignal post-processing method further comprises a decoded sound commonsignal estimation step of obtaining, for the each frame, a decoded soundcommon signal ^({circumflex over ( )})Y_(M) that is a signal common toall channels of the stereo by using at least all of one or more and N orless n-th channel decoded sound signals ^({circumflex over ( )})X_(n), acommon signal purification step of obtaining, for the each frame and foreach corresponding sample t, a sequence based on a value^(˜)y_(M)(t)=(1−α_(M))×^({circumflex over ( )})y_(M)(t)+α_(M)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(M)×^({circumflex over ( )})x_(M)(t)obtained by multiplying a common signal purification weight α_(M) by asample value ^({circumflex over ( )})x_(M)(t) of the monaural decodedsound signal ^({circumflex over ( )})X_(M) and a value(1−α_(M))×^({circumflex over ( )})y_(M)(t) obtained by multiplying avalue (1−α_(M)) obtained by subtracting the common signal purificationweight α_(M) from 1 by a sample value ^({circumflex over ( )})y_(M)(t)of the decoded sound common signal ^({circumflex over ( )})Y_(M), as apurified common signal ^(˜)Y_(M), a decoded sound common signal upmixingstep of obtaining, for the each frame, an n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn) that is a signal obtained byupmixing the decoded sound common signal ^({circumflex over ( )})Y_(M)for the each channel by an upmixing process using the decoded soundcommon signal ^({circumflex over ( )})Y_(M) and information indicating arelationship between the channels of the stereo, a purified commonsignal upmixing step of obtaining, for the each frame, an n-th channelupmixed purified signal ^(˜)Y_(Mn) that is a signal obtained by upmixingthe purified common signal ^(˜)Y_(M) for the each channel by theupmixing process using the purified common signal ^(˜)Y_(M) and theinformation indicating the relationship between the channels of thestereo, an n-th channel separation combination weight estimation step ofobtaining, for the each frame with respect to the each channel n, anormalized inner product value for the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) as an n-th channel separationcombination weight β_(n), and an n-th channel separation combinationstep of obtaining, for the each frame and for each corresponding samplet with respect to the each channel n, a sequence based on a value^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(Mn)(t)+β_(n)×^(˜)y_(Mn)(t)obtained by subtracting a value β_(n)×^({circumflex over ( )})y_(Mn)(t)obtained by multiplying the n-th channel separation combination weightβ_(n) by a sample value ^({circumflex over ( )})y_(Mn)(t) of the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn) from asample value ^({circumflex over ( )})x_(n)(t) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(Mn)(t) of the n-thchannel upmixed purified signal ^(˜)Y_(Mn), as the n-th channel purifieddecoded sound signal ^(˜)X_(n).
 8. A sound signal post-processing methodcomprising the sound signal high-frequency compensation method accordingto claim 1 as a sound signal high-frequency compensation step, the soundsignal post-processing method further comprising a sound signalpurification step of performing signal processing in the time domain,wherein the sound signal purification step obtains, for the each frame,the n-th channel purified decoded sound signal ^(˜)X_(n) that is a soundsignal of the each channel of the stereo by using at least the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and themonaural decoded sound signal ^({circumflex over ( )})X_(M), the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) is obtainedby decoding the stereo code CS without using either information obtainedby decoding the monaural code CM or the monaural code CM, and the soundsignal post-processing method further comprises a decoded sound commonsignal estimation step of obtaining, for the each frame, a decoded soundcommon signal ^({circumflex over ( )})Y_(M) that is a signal common toall channels of the stereo by using at least all of one or more and N orless n-th channel decoded sound signals ^({circumflex over ( )})X_(n), adecoded sound common signal upmixing step of obtaining, for the eachframe, an n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn) that is a signal obtained by upmixing thedecoded sound common signal ^({circumflex over ( )})Y_(M) for the eachchannel by an upmixing process using the decoded sound common signal^({circumflex over ( )})Y_(M) and information indicating a relationshipbetween the channels of the stereo, a monaural decoded sound upmixingstep of obtaining, for the each frame, an n-th channel upmixed monauraldecoded sound signal ^({circumflex over ( )})X_(Mn) that is a signalobtained by upmixing the monaural decoded sound signal^({circumflex over ( )})X_(M) for the each channel by an upmixingprocess using the monaural decoded sound signal^({circumflex over ( )})X_(M) and information indicating a relationshipbetween the channels of the stereo, an n-th channel signal purificationstep of obtaining, for the each frame and for each corresponding samplet with respect to the each channel n, a sequence based on a value^(˜)y_(Mn)(t)=(1−α_(Mn))×^({circumflex over ( )})y_(Mn)(t)+α_(Mn)×^({circumflex over ( )})x_(Mn)(t)obtained by adding a value α_(Mn)×^({circumflex over ( )})x_(Mn)(t)obtained by multiplying an n-th channel purification weight α_(Mn) by asample value ^({circumflex over ( )})x_(Mn)(t) of the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) anda value (1−α_(Mn))×^({circumflex over ( )})y_(Mn)(t) obtained bymultiplying a value (1−α_(Mn)) obtained by subtracting the n-th channelpurification weight α_(Mn) from 1 by a sample value^({circumflex over ( )})y_(Mn)(t) of the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn), as an n-th channel purifiedupmixed signal ^(˜)Y_(Mn), an n-th channel separation combination weightestimation step of obtaining, for the each frame with respect to theeach channel n, a normalized inner product value for the n-th channelupmixed common signal ^({circumflex over ( )})Y_(Mn) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) as an n-th channelseparation combination weight β_(n), and an n-th channel separationcombination step of obtaining, for the each frame and for eachcorresponding sample t with respect to the each channel n, a sequencebased on a value^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(Mn)(t)+β_(n)×^(˜)y_(Mn)(t)obtained by subtracting a value β_(n)×^({circumflex over ( )})y_(Mn)(t)obtained by multiplying the n-th channel separation combination weightβ_(n) by the sample value ^({circumflex over ( )})y_(Mn)(t) of the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn) from asample value ^({circumflex over ( )})x_(n)(t) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(Mn)(t) of the n-thchannel purified upmixed signal ^(˜)Y_(Mn), as the n-th channel purifieddecoded sound signal ^(˜)X_(n).
 9. A sound signal decoding methodcomprising the sound signal high-frequency compensation step and thesound signal purification step of the sound signal post-processingmethod according to claim 4, the sound signal decoding method furthercomprising: a stereo decoding step of decoding the stereo code CS toobtain the n-th channel decoded sound signal^({circumflex over ( )})X_(n) of the each channel n without using eitherinformation obtained by decoding the monaural code CM or the monauralcode CM; and a monaural decoding step of decoding the monaural code CMto obtain the monaural decoded sound signal^({circumflex over ( )})X_(M).
 10. A sound signal high-frequencycompensation device for obtaining, for each frame, an n-th channelcompensated decoded sound signal ^(˜)X′_(n) that is a signal obtained bycompensating a high frequency of an n-th channel purified decoded soundsignal ^(˜)X_(n) obtained by performing signal processing in a timedomain on an n-th channel decoded sound signal^({circumflex over ( )})X_(n) (n is each integer of 1 or more and N orless) that is a decoded sound signal of each channel of stereo obtainedby decoding a stereo code CS, the sound signal high-frequencycompensation device comprising: an n-th channel high-frequencycompensation gain estimation circuitry configured to obtain, for theeach frame with respect to the each channel, an n-th channelhigh-frequency compensation gain ρ_(n) that is a value for bringinghigh-frequency energy of the n-th channel compensated decoded soundsignal ^(˜)X′_(n) close to high-frequency energy of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n); and an n-th channelhigh-frequency compensation circuitry configured to obtain and output,for the each frame with respect to the each channel, a signal obtainedby adding the n-th channel purified decoded sound signal ^(˜)X_(n) and asignal obtained by multiplying a high-frequency component of the n-thchannel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) that is a signal obtained by upmixing,for the each channel, a monaural decoded sound signal^({circumflex over ( )})X_(M) that is obtained by decoding a monauralcode CM that is a code different from the stereo code CS by the n-thchannel high-frequency compensation gain ρ_(n), as the n-th channelcompensated decoded sound signal ^(˜)X′_(n), wherein a signal obtainedby passing the n-th channel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) through a high-pass filter is used as ann-th channel compensation signal ^({circumflex over ( )})X′_(n), then-th channel high-frequency compensation circuitry obtains, for eachcorresponding sample t, a sequence based on a value^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t)obtained by adding a sample value ^(˜)x_(n)(t) of the n-th channelpurified decoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t)obtained by multiplying the n-th channel high-frequency compensationgain ρ_(n) by a sample value ^({circumflex over ( )})x′_(n)(t) of then-th channel compensation signal ^({circumflex over ( )})X′_(n), as then-th channel compensated decoded sound signal ^(˜)X′_(n), and the n-thchannel high-frequency compensation gain estimation circuitry obtains,for each corresponding sample t, a sequence based on a value^(˜)x″_(n)(t)=^(˜)x_(n)(t)+^({circumflex over ( )})x′_(n)(t) obtained byadding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and the sample value^({circumflex over ( )})x′_(n)(t) of the n-th channel compensationsignal ^({circumflex over ( )})X′_(n), as an n-th channel temporaryaddition signal ^(˜)X″_(n), and obtains the n-th channel high-frequencycompensation gain ρ_(n) that is a value larger as high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)is smaller than high-frequency energy ^({circumflex over ( )})EX_(n) ofthe n-th channel decoded sound signal ^({circumflex over ( )})X_(n), andis a value larger as a difference between the high-frequency energy ofthe n-th channel purified decoded sound signal ^(˜)X_(n) andhigh-frequency energy of the n-th channel temporary addition signal^(˜)X″_(n) is smaller than the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n).
 11. A sound signal high-frequencycompensation device for obtaining, for each frame, an n-th channelcompensated decoded sound signal ^(˜)X′_(n) that is a signal obtained bycompensating a high frequency of an n-th channel purified decoded soundsignal ^(˜)X_(n) obtained by performing signal processing in a timedomain on an n-th channel decoded sound signal^({circumflex over ( )})X_(n) (n is each integer of 1 or more and N orless) that is a decoded sound signal of each channel of stereo obtainedby decoding a stereo code CS, the sound signal high-frequencycompensation device comprising: an n-th channel high-frequencycompensation gain estimation circuitry configured to obtain, for theeach frame with respect to the each channel, an n-th channelhigh-frequency compensation gain ρ_(n) that is a value for bringinghigh-frequency energy of the n-th channel compensated decoded soundsignal ^(˜)X′_(n) close to high-frequency energy of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n); and an n-th channelhigh-frequency compensation circuitry configured to obtain and output,for the each frame with respect to the each channel, a signal obtainedby adding the n-th channel purified decoded sound signal ^(˜)X_(n) and asignal obtained by multiplying a high-frequency component of a monauraldecoded sound signal ^({circumflex over ( )})X_(M) that is obtained bydecoding a monaural code CM that is a code different from the stereocode CS by the n-th channel high-frequency compensation gain ρ_(n), asthe n-th channel compensated decoded sound signal ^(˜)X′_(n), wherein asignal obtained by passing the monaural decoded sound signal^({circumflex over ( )})X_(M) through a high-pass filter is used as ann-th channel compensation signal ^({circumflex over ( )})X′_(n), then-th channel high-frequency compensation circuitry obtains, for eachcorresponding sample t, a sequence based on a value^(˜)x′_(n)(t)=^(˜)x_(n)(t)+ρ_(n)×^({circumflex over ( )})x′_(n)(t)obtained by adding a sample value ^(˜)x_(n)(t) of the n-th channelpurified decoded sound signal ^(˜)X_(n) and a value ρ_(n)×x′_(n)(t)obtained by multiplying the n-th channel high-frequency compensationgain ρ_(n) by a sample value ^({circumflex over ( )})x′_(n)(t) of then-th channel compensation signal ^({circumflex over ( )})X′_(n), as then-th channel compensated decoded sound signal ^(˜)X′_(n), and the n-thchannel high-frequency compensation gain estimation circuitry obtains,for each corresponding sample t, a sequence based on a value^(˜)x″_(n)(t)=^(˜)x_(n)(t)+^({circumflex over ( )})x′_(n)(t) obtained byadding the sample value ^(˜)x_(n)(t) of the n-th channel purifieddecoded sound signal ^(˜)X_(n) and the sample value^({circumflex over ( )})x′_(n)(t) of the n-th channel compensationsignal ^({circumflex over ( )})X′_(n), as an n-th channel temporaryaddition signal ^(˜)X″_(n), and obtains the n-th channel high-frequencycompensation gain ρ_(n) that is a value larger as high-frequency energy^(˜)EX_(n) of the n-th channel purified decoded sound signal ^(˜)X_(n)is smaller than high-frequency energy ^({circumflex over ( )})EX_(n) ofthe n-th channel decoded sound signal ^({circumflex over ( )})X_(n), andis a value larger as a difference between the high-frequency energy ofthe n-th channel purified decoded sound signal ^(˜)X_(n) andhigh-frequency energy of the n-th channel temporary addition signal^(˜)X″_(n) is smaller than the high-frequency energy^({circumflex over ( )})EX_(n) of the n-th channel decoded sound signal^({circumflex over ( )})X_(n).
 12. A sound signal post-processing devicecomprising the sound signal high-frequency compensation device accordingto claim 11 as a sound signal high-frequency compensation circuitry, thesound signal post-processing device further comprising a sound signalpurification circuitry configured to perform signal processing in thetime domain, wherein the sound signal purification circuitry obtains,for the each frame, the n-th channel purified decoded sound signal^(˜)X_(n) that is a sound signal of the each channel of the stereo byusing at least the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the monaural decoded sound signal^({circumflex over ( )})X_(M), the n-th channel decoded sound signal^({circumflex over ( )})X_(n) is obtained by decoding the stereo code CSwithout using either information obtained by decoding the monaural codeCM or the monaural code CM, and the sound signal post-processing devicefurther comprises an n-th channel signal purification circuitryconfigured to obtain, for the each frame and for each correspondingsample t with respect to the each channel n, a sequence based on a value^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(n)×^({circumflex over ( )})x_(M)(t)obtained by multiplying an n-th channel purification weight α_(n) by asample value ^({circumflex over ( )})x_(M)(t) of the monaural decodedsound signal ^({circumflex over ( )})X_(M) and a value(1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained by multiplying avalue (1−α_(n)) obtained by subtracting the n-th channel purificationweight α_(n) from 1 by a sample value ^({circumflex over ( )})x_(n)(t)of the n-th channel decoded sound signal ^({circumflex over ( )})X_(n),as the n-th channel purified decoded sound signal ^(˜)X_(n).
 13. A soundsignal post-processing device comprising the sound signal high-frequencycompensation device according to claim 10 as a sound signalhigh-frequency compensation circuitry, the sound signal post-processingdevice further comprising a sound signal purification circuitryconfigured to perform signal processing in the time domain, wherein thesound signal purification circuitry obtains, for the each frame, then-th channel purified decoded sound signal ^(˜)X_(n) that is a soundsignal of the each channel of the stereo by using at least the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) and themonaural decoded sound signal ^({circumflex over ( )})X_(M), the n-thchannel decoded sound signal ^({circumflex over ( )})X_(n) is obtainedby decoding the stereo code CS without using either information obtainedby decoding the monaural code CM or the monaural code CM, and the soundsignal post-processing device further comprises a monaural decoded soundupmixing circuitry configured to obtain, for the each frame, an n-thchannel upmixed monaural decoded sound signal^({circumflex over ( )})X_(Mn) that is a signal obtained by upmixing themonaural decoded sound signal ^({circumflex over ( )})X_(M) for the eachchannel by an upmixing process using the monaural decoded sound signal^({circumflex over ( )})X_(M) and inter-channel relationship informationthat is information indicating a relationship between the channels ofthe stereo, and an n-th channel signal purification circuitry configuredto obtain, for the each frame and for each corresponding sample t withrespect to the each channel n, a sequence based on a value^(˜)x_(n)(t)=(1−α_(n))×^({circumflex over ( )})x_(n)(t)+α_(n)×^({circumflex over ( )})x_(Mn)(t)obtained by adding a value α_(n)×^({circumflex over ( )})x_(Mn)(t)obtained by multiplying an n-th channel purification weight α_(n) by asample value ^({circumflex over ( )})x_(Mn)(t) of the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) anda value (1−α_(n))×^({circumflex over ( )})x_(n)(t) obtained bymultiplying a value (1−α_(n)) obtained by subtracting the n-th channelpurification weight α_(n) from 1 by a sample value^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), as the n-th channel purifieddecoded sound signal X_(n).
 14. A sound signal post-processing devicecomprising the sound signal high-frequency compensation device accordingto claim 11 as a sound signal high-frequency compensation circuitry, thesound signal post-processing device further comprising a sound signalpurification circuitry configured to perform signal processing in thetime domain, wherein the sound signal purification circuitry obtains,for the each frame, the n-th channel purified decoded sound signal^(˜)X_(n) that is a sound signal of the each channel of the stereo byusing at least the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the monaural decoded sound signal^({circumflex over ( )})X_(M), the n-th channel decoded sound signal^({circumflex over ( )})X_(n) is obtained by decoding the stereo code CSwithout using either information obtained by decoding the monaural codeCM or the monaural code CM, and the sound signal post-processing devicefurther comprises a decoded sound common signal estimation circuitryconfigured to obtain, for the each frame, a decoded sound common signal^({circumflex over ( )})Y_(M) that is a signal common to all channels ofthe stereo by using at least all of one or more and N or less n-thchannel decoded sound signals ^({circumflex over ( )})X_(n), a commonsignal purification circuitry configured to obtain, for the each frameand for each corresponding sample t, a sequence based on a value^(˜)y_(Mn)(t)=(1−α_(M))×^({circumflex over ( )})y_(M)(t)+α_(M)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(M)×^({circumflex over ( )})x_(M)(t)obtained by multiplying a common signal purification weight α_(M) by asample value ^({circumflex over ( )})x_(M)(t) of the monaural decodedsound signal ^({circumflex over ( )})X_(M) and a value(1−α_(M))×^({circumflex over ( )})y_(M)(t) obtained by multiplying avalue (1−α_(M)) obtained by subtracting the common signal purificationweight α_(M) from 1 by a sample value ^({circumflex over ( )})y_(M)(t)of the decoded sound common signal ^({circumflex over ( )})Y_(M), as apurified common signal ^(˜)Y_(M), an n-th channel separation combinationweight estimation circuitry configured to obtain, for the each framewith respect to the each channel n, a normalized inner product value forthe decoded sound common signal ^({circumflex over ( )})Y_(M) of then-th channel decoded sound signal ^({circumflex over ( )})X_(n) as ann-th channel separation combination weight β_(n), and an n-th channelseparation combination circuitry configured to obtain, for the eachframe and for each corresponding sample t with respect to the eachchannel n, a sequence based on a value^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(M)(t)+β_(n)×^(˜)y_(Mn)(t)obtained by subtracting a value β_(n)×^({circumflex over ( )})y_(M)(t)obtained by multiplying the n-th channel separation combination weightβ_(n) by the sample value ^({circumflex over ( )})y_(M)(t) of thedecoded sound common signal ^({circumflex over ( )})Y_(M) from a samplevalue ^({circumflex over ( )})x_(n)(t) of the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(M)(t) of the purifiedcommon signal ^(˜)Y_(M), as the n-th channel purified decoded soundsignal ^(˜)X_(n).
 15. A sound signal post-processing device comprisingthe sound signal high-frequency compensation device according to claim11 as a sound signal high-frequency compensation circuitry, the soundsignal post-processing device further comprising a sound signalpurification circuitry configured to perform signal processing in thetime domain, wherein the sound signal purification circuitry obtains,for the each frame, the n-th channel purified decoded sound signal^(˜)X_(n) that is a sound signal of the each channel of the stereo byusing at least the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the monaural decoded sound signal^({circumflex over ( )})X_(M), the n-th channel decoded sound signal^({circumflex over ( )})X_(n) is obtained by decoding the stereo code CSwithout using either information obtained by decoding the monaural codeCM or the monaural code CM, and the sound signal post-processing devicefurther comprises a decoded sound common signal estimation circuitryconfigured to obtain, for the each frame, a decoded sound common signal^({circumflex over ( )})Y_(M) that is a signal common to all channels ofthe stereo by using at least all of one or more and N or less n-thchannel decoded sound signals ^({circumflex over ( )})X_(n), a commonsignal purification circuitry configured to obtain, for the each frameand for each corresponding sample t, a sequence based on a value^(˜)y_(M)(t)=(1−α_(M))×^({circumflex over ( )})y_(M)(t)+α_(M)×^({circumflex over ( )})x_(M)(t)obtained by adding a value α_(M)×^({circumflex over ( )})x_(M)(t)obtained by multiplying a common signal purification weight α_(M) by asample value ^({circumflex over ( )})x_(M)(t) of the monaural decodedsound signal ^({circumflex over ( )})X_(M) and a value(1−α_(M))×^({circumflex over ( )})y_(M)(T) obtained by multiplying avalue (1−α_(M)) obtained by subtracting the common signal purificationweight α_(M) from 1 by a sample value ^({circumflex over ( )})y_(M)(t)of the decoded sound common signal ^({circumflex over ( )})Y_(M), as apurified common signal ^(˜)Y_(M), a decoded sound common signal upmixingcircuitry configured to obtain, for the each frame, an n-th channelupmixed common signal ^({circumflex over ( )})Y_(Mn) that is a signalobtained by upmixing the decoded sound common signal^({circumflex over ( )})Y_(M) for the each channel by an upmixingprocess using the decoded sound common signal^({circumflex over ( )})Y_(M) and information indicating a relationshipbetween the channels of the stereo, a purified common signal upmixingcircuitry configured to obtain, for the each frame, an n-th channelupmixed purified signal ^(˜)Y_(Mn) that is a signal obtained by upmixingthe purified common signal ^(˜)Y_(M) for the each channel by theupmixing process using the purified common signal ^(˜)Y_(M) and theinformation indicating the relationship between the channels of thestereo, an n-th channel separation combination weight estimationcircuitry configured to obtain, for the each frame with respect to theeach channel n, a normalized inner product value for the n-th channelupmixed common signal ^({circumflex over ( )})Y_(Mn) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) as an n-th channelseparation combination weight β_(n), and an n-th channel separationcombination circuitry configured to obtain, for the each frame and foreach corresponding sample t with respect to the each channel n, asequence based on a value^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(Mn)(t)+β_(n)×^(˜)y_(Mn)(t)obtained by subtracting a value β_(n)×^({circumflex over ( )})y_(Mn)(t)obtained by multiplying the n-th channel separation combination weightβ_(n) by a sample value ^({circumflex over ( )})y_(Mn)(t) of the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn) from asample value ^({circumflex over ( )})x_(n)(t) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n), and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(Mn)(t) of the n-thchannel upmixed purified signal ^(˜)Y_(Mn), as the n-th channel purifieddecoded sound signal ^(˜)X_(n).
 16. A sound signal post-processingdevice comprising the sound signal high-frequency compensation deviceaccording to claim 10 as a sound signal high-frequency compensationcircuitry, the sound signal post-processing device further comprising asound signal purification circuitry configured to perform signalprocessing in the time domain, wherein the sound signal purificationcircuitry obtains, for the each frame, the n-th channel purified decodedsound signal ^(˜)X_(n) that is a sound signal of the each channel of thestereo by using at least the n-th channel decoded sound signal^({circumflex over ( )})X_(n) and the monaural decoded sound signal^({circumflex over ( )})X_(M), the n-th channel decoded sound signal^({circumflex over ( )})X_(n) is obtained by decoding the stereo code CSwithout using either information obtained by decoding the monaural codeCM or the monaural code CM, and the sound signal post-processing devicefurther comprises a decoded sound common signal estimation circuitryconfigured to obtain, for the each frame, a decoded sound common signal^({circumflex over ( )})Y_(M) that is a signal common to all channels ofthe stereo by using at least all of one or more and N or less n-thchannel decoded sound signals ^({circumflex over ( )})X_(n), a decodedsound common signal upmixing circuitry configured to obtain, for theeach frame, an n-th channel upmixed common signal^({circumflex over ( )})Y_(Mn) that is a signal obtained by upmixing thedecoded sound common signal ^({circumflex over ( )})Y_(M) for the eachchannel by an upmixing process using the decoded sound common signal^({circumflex over ( )})Y_(M) and information indicating a relationshipbetween the channels of the stereo, a monaural decoded sound upmixingcircuitry configured to obtain, for the each frame, an n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn)that is a signal obtained by upmixing the monaural decoded sound signal^({circumflex over ( )})X_(M) for the each channel by an upmixingprocess using the monaural decoded sound signal^({circumflex over ( )})X_(M) and information indicating a relationshipbetween the channels of the stereo, an n-th channel signal purificationcircuitry configured to obtain, for the each frame and for eachcorresponding sample t with respect to the each channel n, a sequencebased on a value^(˜)y_(Mn)(t)=(1−α_(Mn))×^({circumflex over ( )})y_(Mn)(t)+α_(Mn)×^({circumflex over ( )})x_(Mn)(t)obtained by adding a value α_(Mn)×^({circumflex over ( )})x_(Mn)(t)obtained by multiplying an n-th channel purification weight α_(Mn) by asample value ^({circumflex over ( )})x_(Mn)(t) of the n-th channelupmixed monaural decoded sound signal ^({circumflex over ( )})X_(Mn) anda value (1−α_(Mn))×^({circumflex over ( )})y_(Mn)(t) obtained bymultiplying a value (1−α_(Mn)) obtained by subtracting the n-th channelpurification weight α_(Mn) from 1 by a sample value^({circumflex over ( )})y_(Mn)(t) of the n-th channel upmixed commonsignal ^({circumflex over ( )})Y_(Mn), as an n-th channel purifiedupmixed signal ^(˜)Y_(Mn), an n-th channel separation combination weightestimation circuitry configured to obtain, for the each frame withrespect to the each channel n, a normalized inner product value for then-th channel upmixed common signal ^({circumflex over ( )})Y_(Mn) of then-th channel decoded sound signal ^({circumflex over ( )})X_(n) as ann-th channel separation combination weight β_(n), and an n-th channelseparation combination circuitry configured to obtain, for the eachframe and for each corresponding sample t with respect to the eachchannel n, a sequence based on a value^(˜)x_(n)(t)=^({circumflex over ( )})x_(n)(t)−β_(n)×^({circumflex over ( )})y_(Mn)(t)+β_(n)×^(˜)y_(Mn)(t)obtained by subtracting a value β_(n)×^({circumflex over ( )})y_(Mn)(t)obtained by multiplying the n-th channel separation combination weightβ_(n) by the sample value ^({circumflex over ( )})y_(Mn)(t) of the n-thchannel upmixed common signal ^({circumflex over ( )})Y_(Mn) from asample value ^({circumflex over ( )})x_(n)(t) of the n-th channeldecoded sound signal ^({circumflex over ( )})X_(n) and adding a valueβ_(n)×^(˜)y_(Mn)(t) obtained by multiplying the n-th channel separationcombination weight β_(n) by a sample value ^(˜)y_(Mn)(t) of the n-thchannel purified upmixed signal ^(˜)Y_(Mn), as the n-th channel purifieddecoded sound signal ^(˜)X_(n).
 17. A sound signal decoding devicecomprising the sound signal high-frequency compensation circuitry andthe sound signal purification circuitry of the sound signalpost-processing device according to claim 12, the sound signal decodingdevice further comprising: a stereo decoding circuitry configured todecode the stereo code CS to obtain the n-th channel decoded soundsignal ^({circumflex over ( )})X_(n) of the each channel n without usingeither information obtained by decoding the monaural code CM or themonaural code CM; and a monaural decoding circuitry configured to decodethe monaural code CM to obtain the monaural decoded sound signal^({circumflex over ( )})X_(M).
 18. (canceled)
 19. A non-transitorycomputer-readable recording medium recording a program for causing acomputer to execute the steps of the method according to claim 1.